Welcome

This is an R Markdown document that summarises the main figures and data associated with the information present in the ErythroCite database. This database uses a systematic map to search for information on the cell size of fish blood cells.

Summary

Size is a fundamental trait in biology, and cell size plays a key role in cellular functions, influencing physiological adaptations and evolutionary processes in living organisms. For decades, scientists have been fascinated by the considerable variation in cell sizes among animals, yet systematic efforts to compile such data have been scarce. To address this gap, we employed a systematic map approach to create ErythroCite, an open-source database of fish erythrocyte sizes. This comprehensive resource encompasses 1,764 records from 660 species among four major lineages: Actinopterygii, Chondrichthyes, Dipnoi, and Cyclostomata. Our findings reveal a remarkable 414-fold range in cell volume, with most studies on bony fishes and limited data on juveniles and earlier life stages. Life stage and sex were infrequently reported, but available data showed equal representation of adult of females and males. ErythroCite offers valuable insights for studies in macroecology, macrophysiology, comparative physiology, evolutionary biology and cell biology. We anticipate this resource will facilitate comparative approaches and meta-analyses, globally driving further exploration of erythrocyte diversity and function in fish.

Citation

When using the data and/or code associated with this project, they should be cited as follows:

  • Leiva, F. P., Molina-Venegas, R., Alter, K., Freire, C.A., Hendriks A. J., Hermaniuk, A., Serre-Fredj, L., Shokri, M., Czarnoleski, M., & Mark, F. C. (2025). ErythroCite: A systematic map and open-source database on red blood cell size of fishes.

  • Leiva, F. P., Molina-Venegas, R., Alter, K., Freire, C. A., Hendriks A. J., Hermaniuk, A., Serre-Fredj, L., Shokri, M., Czarnoleski, M., & Mark, F. C. (2025). ErythroCite: A systematic map and open-source database on red blood cell size of fishes. Zenodo. https://doi.org/10.5281/zenodo.14781325.

Contact

This script is authored by Félix P. Leiva. For any questions related to this resource, please contact me at the email address: felixpleiva@gmail.com.

Disclaimer

This code routine may contain typographical errors, specific lines of code, or comments in Spanish (my native language). I apologise for any inconvenience this might cause in understanding the code.

I will update the GitHub repository with any identified errors when appropriate. Therefore, I strongly recommend users to check the repository where the data are stored: https://github.com/felixpleiva/ErythroCite. This ensures access to the most current version of the code and data.

Should you encounter any errors in the code or data, please let me know via email.

Gracias!

Licence

This repository is provided by the author under the licence Attribution-NonCommercial-NoDerivatives 4.0 International.

Clean working space

rm(list = ls())

Load libraries

library(kableExtra)       # Enhances tables created with 'knitr::kable'
library(DataExplorer)     # Automates exploratory data analysis
library(dplyr)            # Efficient data manipulation
library(ggplot2)          # Data visualisation based on the grammar of graphics
library(RefManageR)       # Manages references and citations
library(ggpubr)           # Creates publication-ready graphics
library(cowplot)          # Arranges and annotates plots
library(tidygeocoder)     # Converts addresses into geographic coordinates
library(rnaturalearth)    # Accesses Natural Earth geographic data
library(ape)              # Analyses phylogenies and evolution
library(ggtree)           # Visualises and annotates phylogenetic trees
library(tibble)           # Alternative to data frames
library(ggthemes)         # Additional themes for 'ggplot2' graphics
library(fishualize)       # Fish-inspired colour palettes
library(sessioninfo)      # Documents session environment for reproducibility
library(details)          # Adds inline or interactive details
library(rfishbase)        # Retrieve taxonomy from FishBase (https://www.fishbase.se)

Import data

The associated data, as well as supplementary files, were directly imported from:

GitHub: https://github.com/felixpleiva/ErythroCite/

Zenodo: https://doi.org/10.5281/zenodo.14781325

Load data

dat <- read.csv("../outputs/cell_size_with_taxonomy.csv")

Load phylogenetic tree

tree<-read.tree("../outputs/Phylogenetic tree for 650 species included in ErythroCite.tre")

Load references

refs <- ReadBib("../outputs/ErythroCite literature.bib")

Data exploration

Check and reformat variables

str(dat)
## 'data.frame':    1765 obs. of  45 variables:
##  $ species_reported    : chr  "Abalistes stellatus" "Abramis brama" "Abudefduf marginatus" "Abudefduf saxatilis" ...
##  $ double_checked      : chr  "YES" "YES" "YES" "YES" ...
##  $ database            : chr  "Gregory_2024" "Gregory_2024" "systematic_search_english" "felix" ...
##  $ key                 : chr  "9_gregory" "pdf_not_found_Gulliver_1875" "rayyan-33186100" "16_fpl" ...
##  $ body_mass_gram      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sex                 : chr  NA NA NA NA ...
##  $ life_stage          : chr  NA NA NA NA ...
##  $ lat_dec             : num  NA NA 32.4 18.2 18.2 ...
##  $ long_dec            : num  NA NA -64.7 -66.5 -66.5 ...
##  $ location_description: chr  NA NA "near the Bermuda Biological Station, St. George, Bermuda\n" "southwestern and western coasts of Puerto Rico" ...
##  $ sample_size         : int  NA NA 50 NA NA NA NA NA NA NA ...
##  $ number_of_specimens : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ estimate_error_type : chr  NA NA "2SE" NA ...
##  $ cell_length         : num  NA 10.6 NA 10.2 10 10.5 10.5 10 9 9 ...
##  $ cell_length_error   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cell_width          : num  NA 7.1 NA 7.5 7.5 7.5 7.5 7.5 6.5 5.5 ...
##  $ cell_width_error    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cell_area           : num  43.9 59.4 NA NA NA ...
##  $ cell_area_error     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cell_volume         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cell_volume_error   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ mcv                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ mcv_error           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_length      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_length_error: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_width       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_width_error : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_area        : num  7.36 NA 4.1 NA NA NA NA NA NA NA ...
##  $ nucleus_area_error  : num  NA NA 0.09 NA NA NA NA NA NA NA ...
##  $ nucleus_volume      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_volume_error: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ notes               : chr  "Hardie, D.C. and P.D.N. Hebert (2003). The nucleotypic effects of cellular DNA content in cartilaginous and ray"| __truncated__ "Gulliver, G. (1875). Observations on the sizes and shapes of the red corpuscles of the blood of vertebrates, wi"| __truncated__ "Table 1" "Saunders, D.C. (1966). Differential Blood Cell Counts of 121 Species of Marine Fishes of Puerto Rico" ...
##  $ phylum              : chr  "Chordata" "Chordata" "Chordata" "Chordata" ...
##  $ class               : chr  "Actinopterygii" "Actinopterygii" "Actinopterygii" "Actinopterygii" ...
##  $ order               : chr  "Tetraodontiformes" "Cypriniformes" "Perciformes" "Perciformes" ...
##  $ family              : chr  "Balistidae" "Leuciscidae" "Pomacentridae" "Pomacentridae" ...
##  $ genus               : chr  "Abalistes" "Abramis" "Abudefduf" "Abudefduf" ...
##  $ species             : chr  "Abalistes stellatus" "Abramis brama" "Abudefduf saxatilis" "Abudefduf saxatilis" ...
##  $ source              : chr  "ncbi" "ncbi" "gbif" "ncbi" ...
##  $ taxo_level          : chr  "Species" "Species" "Species" "Species" ...
##  $ isMarine            : int  1 0 1 1 1 1 1 1 1 1 ...
##  $ isBrackish          : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ isFresh             : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ realm               : chr  "marine" "freshwater-brackish" "marine" "marine" ...
##  $ species_underscored : chr  "Abalistes_stellatus" "Abramis_brama" "Abudefduf_saxatilis" "Abudefduf_saxatilis" ...
dat$sex <- as.factor(dat$sex)
dat$life_stage <- as.factor(dat$life_stage)
dat$sample_size <- as.factor(dat$sample_size)
dat$number_of_specimens <- as.factor(dat$number_of_specimens)

Check again

str(dat)
## 'data.frame':    1765 obs. of  45 variables:
##  $ species_reported    : chr  "Abalistes stellatus" "Abramis brama" "Abudefduf marginatus" "Abudefduf saxatilis" ...
##  $ double_checked      : chr  "YES" "YES" "YES" "YES" ...
##  $ database            : chr  "Gregory_2024" "Gregory_2024" "systematic_search_english" "felix" ...
##  $ key                 : chr  "9_gregory" "pdf_not_found_Gulliver_1875" "rayyan-33186100" "16_fpl" ...
##  $ body_mass_gram      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ sex                 : Factor w/ 3 levels "both","female",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ life_stage          : Factor w/ 3 levels "adult","fingerlings",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ lat_dec             : num  NA NA 32.4 18.2 18.2 ...
##  $ long_dec            : num  NA NA -64.7 -66.5 -66.5 ...
##  $ location_description: chr  NA NA "near the Bermuda Biological Station, St. George, Bermuda\n" "southwestern and western coasts of Puerto Rico" ...
##  $ sample_size         : Factor w/ 37 levels "1","2","3","4",..: NA NA 22 NA NA NA NA NA NA NA ...
##  $ number_of_specimens : Factor w/ 45 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
##  $ estimate_error_type : chr  NA NA "2SE" NA ...
##  $ cell_length         : num  NA 10.6 NA 10.2 10 10.5 10.5 10 9 9 ...
##  $ cell_length_error   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cell_width          : num  NA 7.1 NA 7.5 7.5 7.5 7.5 7.5 6.5 5.5 ...
##  $ cell_width_error    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cell_area           : num  43.9 59.4 NA NA NA ...
##  $ cell_area_error     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cell_volume         : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ cell_volume_error   : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ mcv                 : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ mcv_error           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_length      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_length_error: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_width       : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_width_error : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_area        : num  7.36 NA 4.1 NA NA NA NA NA NA NA ...
##  $ nucleus_area_error  : num  NA NA 0.09 NA NA NA NA NA NA NA ...
##  $ nucleus_volume      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ nucleus_volume_error: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ notes               : chr  "Hardie, D.C. and P.D.N. Hebert (2003). The nucleotypic effects of cellular DNA content in cartilaginous and ray"| __truncated__ "Gulliver, G. (1875). Observations on the sizes and shapes of the red corpuscles of the blood of vertebrates, wi"| __truncated__ "Table 1" "Saunders, D.C. (1966). Differential Blood Cell Counts of 121 Species of Marine Fishes of Puerto Rico" ...
##  $ phylum              : chr  "Chordata" "Chordata" "Chordata" "Chordata" ...
##  $ class               : chr  "Actinopterygii" "Actinopterygii" "Actinopterygii" "Actinopterygii" ...
##  $ order               : chr  "Tetraodontiformes" "Cypriniformes" "Perciformes" "Perciformes" ...
##  $ family              : chr  "Balistidae" "Leuciscidae" "Pomacentridae" "Pomacentridae" ...
##  $ genus               : chr  "Abalistes" "Abramis" "Abudefduf" "Abudefduf" ...
##  $ species             : chr  "Abalistes stellatus" "Abramis brama" "Abudefduf saxatilis" "Abudefduf saxatilis" ...
##  $ source              : chr  "ncbi" "ncbi" "gbif" "ncbi" ...
##  $ taxo_level          : chr  "Species" "Species" "Species" "Species" ...
##  $ isMarine            : int  1 0 1 1 1 1 1 1 1 1 ...
##  $ isBrackish          : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ isFresh             : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ realm               : chr  "marine" "freshwater-brackish" "marine" "marine" ...
##  $ species_underscored : chr  "Abalistes_stellatus" "Abramis_brama" "Abudefduf_saxatilis" "Abudefduf_saxatilis" ...
head(dat)
##       species_reported double_checked                  database
## 1  Abalistes stellatus            YES              Gregory_2024
## 2        Abramis brama            YES              Gregory_2024
## 3 Abudefduf marginatus            YES systematic_search_english
## 4  Abudefduf saxatilis            YES                     felix
## 5  Abudefduf saxatilis            YES                     felix
## 6  Abudefduf saxatilis            YES                     felix
##                           key body_mass_gram  sex life_stage  lat_dec  long_dec
## 1                   9_gregory             NA <NA>       <NA>       NA        NA
## 2 pdf_not_found_Gulliver_1875             NA <NA>       <NA>       NA        NA
## 3             rayyan-33186100             NA <NA>       <NA> 32.36700 -64.69760
## 4                      16_fpl             NA <NA>       <NA> 18.22477 -66.48583
## 5                      16_fpl             NA <NA>       <NA> 18.22477 -66.48583
## 6                      16_fpl             NA <NA>       <NA> 18.22477 -66.48583
##                                         location_description sample_size
## 1                                                       <NA>        <NA>
## 2                                                       <NA>        <NA>
## 3 near the Bermuda Biological Station, St. George, Bermuda\n          50
## 4             southwestern and western coasts of Puerto Rico        <NA>
## 5             southwestern and western coasts of Puerto Rico        <NA>
## 6             southwestern and western coasts of Puerto Rico        <NA>
##   number_of_specimens estimate_error_type cell_length cell_length_error
## 1                <NA>                <NA>          NA                NA
## 2                <NA>                <NA>        10.6                NA
## 3                <NA>                 2SE          NA                NA
## 4                <NA>                <NA>        10.2                NA
## 5                <NA>                <NA>        10.0                NA
## 6                <NA>                <NA>        10.5                NA
##   cell_width cell_width_error cell_area cell_area_error cell_volume
## 1         NA               NA     43.91              NA          NA
## 2        7.1               NA     59.39              NA          NA
## 3         NA               NA        NA              NA          NA
## 4        7.5               NA        NA              NA          NA
## 5        7.5               NA        NA              NA          NA
## 6        7.5               NA        NA              NA          NA
##   cell_volume_error mcv mcv_error nucleus_length nucleus_length_error
## 1                NA  NA        NA             NA                   NA
## 2                NA  NA        NA             NA                   NA
## 3                NA  NA        NA             NA                   NA
## 4                NA  NA        NA             NA                   NA
## 5                NA  NA        NA             NA                   NA
## 6                NA  NA        NA             NA                   NA
##   nucleus_width nucleus_width_error nucleus_area nucleus_area_error
## 1            NA                  NA         7.36                 NA
## 2            NA                  NA           NA                 NA
## 3            NA                  NA         4.10               0.09
## 4            NA                  NA           NA                 NA
## 5            NA                  NA           NA                 NA
## 6            NA                  NA           NA                 NA
##   nucleus_volume nucleus_volume_error
## 1             NA                   NA
## 2             NA                   NA
## 3             NA                   NA
## 4             NA                   NA
## 5             NA                   NA
## 6             NA                   NA
##                                                                                                                                                                                                                                                                    notes
## 1                                                                                                                     Hardie, D.C. and P.D.N. Hebert (2003). The nucleotypic effects of cellular DNA content in cartilaginous and ray-finned fishes. Genome 46: 683-706.
## 2 Gulliver, G. (1875). Observations on the sizes and shapes of the red corpuscles of the blood of vertebrates, with drawings of them to a uniform scale, and extended and revised tables of measurements. Proceedings of the Zoological Society of London 1875: 474-495.
## 3                                                                                                                                                                                                                                                                Table 1
## 4                                                                                                                                                                   Saunders, D.C. (1966). Differential Blood Cell Counts of 121 Species of Marine Fishes of Puerto Rico
## 5                                                                                                                                                                   Saunders, D.C. (1966). Differential Blood Cell Counts of 121 Species of Marine Fishes of Puerto Rico
## 6                                                                                                                                                                   Saunders, D.C. (1966). Differential Blood Cell Counts of 121 Species of Marine Fishes of Puerto Rico
##     phylum          class             order        family     genus
## 1 Chordata Actinopterygii Tetraodontiformes    Balistidae Abalistes
## 2 Chordata Actinopterygii     Cypriniformes   Leuciscidae   Abramis
## 3 Chordata Actinopterygii       Perciformes Pomacentridae Abudefduf
## 4 Chordata Actinopterygii       Perciformes Pomacentridae Abudefduf
## 5 Chordata Actinopterygii       Perciformes Pomacentridae Abudefduf
## 6 Chordata Actinopterygii       Perciformes Pomacentridae Abudefduf
##               species source taxo_level isMarine isBrackish isFresh
## 1 Abalistes stellatus   ncbi    Species        1          0       0
## 2       Abramis brama   ncbi    Species        0          1       1
## 3 Abudefduf saxatilis   gbif    Species        1          0       0
## 4 Abudefduf saxatilis   ncbi    Species        1          0       0
## 5 Abudefduf saxatilis   ncbi    Species        1          0       0
## 6 Abudefduf saxatilis   ncbi    Species        1          0       0
##                 realm species_underscored
## 1              marine Abalistes_stellatus
## 2 freshwater-brackish       Abramis_brama
## 3              marine Abudefduf_saxatilis
## 4              marine Abudefduf_saxatilis
## 5              marine Abudefduf_saxatilis
## 6              marine Abudefduf_saxatilis

Inspection of the data set

Checks following code of Pottier et al. 2021. Sexual (in)equality? A meta-analysis of sex differences in thermal acclimation capacity across ectotherms (also cited in the main text).

kable(summary(dat), "html") %>%
  kable_styling("striped", position = "left") %>%
  scroll_box(width = "100%", height = "500px")
species_reported double_checked database key body_mass_gram sex life_stage lat_dec long_dec location_description sample_size number_of_specimens estimate_error_type cell_length cell_length_error cell_width cell_width_error cell_area cell_area_error cell_volume cell_volume_error mcv mcv_error nucleus_length nucleus_length_error nucleus_width nucleus_width_error nucleus_area nucleus_area_error nucleus_volume nucleus_volume_error notes phylum class order family genus species source taxo_level isMarine isBrackish isFresh realm species_underscored
Length:1765 Length:1765 Length:1765 Length:1765 Min. : 0.51 both : 98 adult : 204 Min. :-42.39 Min. :-107.31 Length:1765 30 : 30 10 : 24 Length:1765 Min. : 5.51 Min. :0.0090 Min. : 3.740 Min. : 0.008 Min. : 16.22 Min. : 0.120 Min. : 41.08 Min. : 1.438 Min. : 0.019 Min. : 0.0014 Min. : 2.240 Min. :0.0100 Min. :1.440 Min. :0.0200 Min. : 2.56 Min. : 0.030 Min. : 3.42 Min. : 0.100 Length:1765 Length:1765 Length:1765 Length:1765 Length:1765 Length:1765 Length:1765 Length:1765 Length:1765 Min. :0.0000 Min. :0.0000 Min. :0.0000 Length:1765 Length:1765
Class :character Class :character Class :character Class :character 1st Qu.: 25.68 female: 47 fingerlings: 9 1st Qu.: 18.22 1st Qu.: -66.49 Class :character 1 : 25 20 : 14 Class :character 1st Qu.: 9.50 1st Qu.:0.2075 1st Qu.: 6.500 1st Qu.: 0.220 1st Qu.: 58.05 1st Qu.: 2.435 1st Qu.: 286.00 1st Qu.:11.700 1st Qu.: 107.025 1st Qu.: 2.8162 1st Qu.: 4.400 1st Qu.:0.2000 1st Qu.:3.000 1st Qu.:0.1500 1st Qu.: 10.84 1st Qu.: 0.500 1st Qu.: 18.75 1st Qu.: 1.173 Class :character Class :character Class :character Class :character Class :character Class :character Class :character Class :character Class :character 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Median : 78.10 male : 41 juvenile : 72 Median : 18.22 Median : -66.49 Mode :character 20 : 24 30 : 14 Mode :character Median :10.00 Median :0.4150 Median : 7.500 Median : 0.340 Median : 70.38 Median : 4.250 Median : 579.52 Median :23.000 Median : 150.738 Median : 11.8000 Median : 5.250 Median :0.3200 Median :3.500 Median :0.2400 Median : 14.03 Median : 1.190 Median : 39.19 Median : 2.000 Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Median :1.0000 Median :0.0000 Median :0.0000 Mode :character Mode :character
NA NA NA NA Mean : 1802.84 NA’s :1579 NA’s :1480 Mean : 20.39 Mean : -10.25 NA 50 : 24 8 : 12 NA Mean :11.15 Mean :0.6021 Mean : 7.791 Mean : 2.692 Mean : 94.99 Mean : 6.717 Mean : 569.99 Mean :26.096 Mean : 201.688 Mean : 19.4054 Mean : 5.427 Mean :0.3815 Mean :3.739 Mean :0.2902 Mean : 18.91 Mean : 1.865 Mean : 70.10 Mean : 2.625 NA NA NA NA NA NA NA NA NA Mean :0.6634 Mean :0.4776 Mean :0.4593 NA NA
NA NA NA NA 3rd Qu.: 203.12 NA NA 3rd Qu.: 30.06 3rd Qu.: 76.95 NA 5 : 22 1 : 11 NA 3rd Qu.:11.80 3rd Qu.:0.8575 3rd Qu.: 8.408 3rd Qu.: 0.690 3rd Qu.: 89.19 3rd Qu.: 6.700 3rd Qu.: 693.54 3rd Qu.:37.550 3rd Qu.: 204.750 3rd Qu.: 24.0250 3rd Qu.: 6.120 3rd Qu.:0.5000 3rd Qu.:4.225 3rd Qu.:0.4000 3rd Qu.: 19.59 3rd Qu.: 2.178 3rd Qu.:105.73 3rd Qu.: 3.683 NA NA NA NA NA NA NA NA NA 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 NA NA
NA NA NA NA Max. :217271.00 NA NA Max. : 60.95 Max. : 146.49 NA (Other): 91 (Other): 144 NA Max. :44.60 Max. :2.9530 Max. :27.000 Max. :259.334 Max. :944.70 Max. :66.600 Max. :1889.41 Max. :78.890 Max. :6940.000 Max. :293.0000 Max. :17.500 Max. :1.4260 Max. :9.750 Max. :1.2000 Max. :157.33 Max. :14.900 Max. :307.64 Max. :11.800 NA NA NA NA NA NA NA NA NA Max. :1.0000 Max. :1.0000 Max. :1.0000 NA NA
NA NA NA NA NA’s :1283 NA NA NA’s :525 NA’s :525 NA NA’s :1549 NA’s :1546 NA NA’s :867 NA’s :1649 NA’s :867 NA’s :1648 NA’s :1028 NA’s :1671 NA’s :1650 NA’s :1724 NA’s :1463 NA’s :1501 NA’s :1598 NA’s :1660 NA’s :1598 NA’s :1660 NA’s :1307 NA’s :1651 NA’s :1598 NA’s :1697 NA NA NA NA NA NA NA NA NA NA’s :18 NA’s :21 NA’s :21 NA NA

In our analysis, we assessed a range of cellular parameters across multiple studies, including cell size, cell volume, nucleus area, nucleus volume, and mean corpuscular volume. By analyzing and comparing the mean, minimum, and maximum values for each metric, I aim to identify potential outliers and determine whether extreme values are predominantly associated with specific studies.

Check for extreme values

kable(dat %>%
        group_by(key) %>%
        summarise(mean_cell_area = mean(cell_area), max_cell_area = max(cell_area), min_cell_area = min(cell_area), sd_cell_area = sd(cell_area),
                  mean_cell_volume = mean(cell_volume), max_cell_volume = max(cell_volume), min_cell_volume = min(cell_volume), sd_cell_volume = sd(cell_volume),
                  mean_nucleus_area = mean(nucleus_area), max_nucleus_area = max(nucleus_area), min_nucleus_area = min(nucleus_area), sd_nucleus_area = sd(nucleus_area),
                  mean_nucleus_volume = mean(nucleus_volume), max_nucleus_volume= max(nucleus_volume), min_nucleus_volume = min(nucleus_volume), sd_nucleus_volume = sd(nucleus_volume),
                  mean_mcv = mean(mcv), max_mcv = max(mcv), min_mcv = min(mcv), sd_mcv = sd(mcv), 
                  n = n())) %>%
  kable_styling("striped", position = "left") %>%
  scroll_box(width = "100%", height = "500px")
key mean_cell_area max_cell_area min_cell_area sd_cell_area mean_cell_volume max_cell_volume min_cell_volume sd_cell_volume mean_nucleus_area max_nucleus_area min_nucleus_area sd_nucleus_area mean_nucleus_volume max_nucleus_volume min_nucleus_volume sd_nucleus_volume mean_mcv max_mcv min_mcv sd_mcv n
11_fpl 73.20000 73.20000 73.20000 NA NA NA NA NA 13.300000 13.300000 13.300000 NA NA NA NA NA 163.80000 163.80000 163.80000 NA 1
12_fpl NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 294.95000 529.50000 139.70000 156.4641013 6
13_gregory 123.48071 303.98000 37.70000 93.6114967 NA NA NA NA 18.417143 42.190000 7.760000 11.094015 NA NA NA NA NA NA NA NA 14
14_fpl NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
15_fpl NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3
16_fpl NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 601
17_gregory NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 12
18_fpl NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4
1_fpl NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
2_fpl 75.60000 75.60000 75.60000 NA NA NA NA NA 12.000000 12.000000 12.000000 NA NA NA NA NA NA NA NA NA 1
3_gregory NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 13
5_fpl NA NA NA NA NA NA NA NA 19.060000 21.600000 10.500000 3.266054 NA NA NA NA NA NA NA NA 10
6_gregory NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 7
9_gregory 99.62252 639.02000 32.34000 102.7547647 NA NA NA NA 21.001261 157.330000 5.460000 24.679091 NA NA NA NA NA NA NA NA 222
pdf_not_found_Gulliver_1875 127.30679 944.70000 42.80000 144.6876504 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 81
pdf_not_found_Kisch_1949a NA NA NA NA NA NA NA NA 19.067273 38.170000 7.480000 11.493088 NA NA NA NA NA NA NA NA 11
pdf_not_found_Kisch_1949b 123.51750 245.26000 64.87000 83.0318157 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4
pdf_not_found_Kisch_1951 207.45000 542.99000 75.76000 174.4467711 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 6
pdf_not_found_Potter_et_al_1982 117.18000 128.68000 105.68000 16.2634560 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2
pdf_not_found_Wintrobe_1933 138.54600 390.34000 51.44000 111.6304454 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 20
rayyan-33184974 63.63636 73.47000 44.88000 7.3568126 NA NA NA NA 21.515455 29.170000 13.960000 4.179964 NA NA NA NA NA NA NA NA 22
rayyan-33185067 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 134.14000 139.04000 129.24000 6.9296465 2
rayyan-33185224 98.87500 102.96000 94.79000 5.7770624 NA NA NA NA 17.035000 18.850000 15.220000 2.566798 NA NA NA NA NA NA NA NA 2
rayyan-33185285 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
rayyan-33185655 NA NA NA NA 266.0000 266.0000 266.0000 NA NA NA NA NA NA NA NA NA NA NA NA NA 1
rayyan-33185673 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 74.60000 74.60000 74.60000 NA 1
rayyan-33185780 89.36252 186.04800 36.25600 28.4751755 719.8610 1889.4072 165.7881 343.10499 15.822703 30.007000 6.350200 4.880071 128.556361 307.6396 28.29905 62.970993 NA NA NA NA 74
rayyan-33185781 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 153.57500 317.90000 75.00000 46.8768469 52
rayyan-33185785 NA NA NA NA 268.0000 459.0000 157.0000 94.00473 NA NA NA NA 46.600000 91.0000 24.00000 19.351715 NA NA NA NA 10
rayyan-33185798 78.38389 121.23045 46.37760 36.0588349 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4
rayyan-33185801 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
rayyan-33185803 265.90000 668.70000 127.00000 227.2101890 216.3000 728.3000 54.8000 287.70053 41.420000 76.200000 29.800000 19.663342 22.960000 58.1000 11.20000 19.839430 NA NA NA NA 5
rayyan-33185807 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 490.00000 490.00000 490.00000 NA 1
rayyan-33185817 78.79500 110.50000 56.33000 18.6703133 445.2200 695.1800 268.3000 160.72851 11.318750 19.130000 8.360000 3.427921 23.223750 46.3400 13.63000 10.103183 NA NA NA NA 8
rayyan-33185819 257.70218 257.70218 257.70218 NA 260.7721 260.7721 260.7721 NA NA NA NA NA NA NA NA NA NA NA NA NA 1
rayyan-33185820 NA NA NA NA NA NA NA NA NA NA NA NA 28.700000 30.3000 27.10000 2.262742 NA NA NA NA 2
rayyan-33185881 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 132.20000 132.20000 132.20000 NA 1
rayyan-33185890 170.90000 170.90000 170.90000 NA NA NA NA NA 37.000000 37.000000 37.000000 NA NA NA NA NA NA NA NA NA 1
rayyan-33185896 89.36000 109.02000 76.65000 14.2078077 NA NA NA NA 17.317500 19.440000 13.970000 2.377749 NA NA NA NA NA NA NA NA 4
rayyan-33185905 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 139.82000 139.82000 139.82000 NA 1
rayyan-33185912 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 9
rayyan-33185919 43.94286 51.90000 34.30000 6.4562262 186.2286 225.6000 131.4000 33.57766 6.042857 7.700000 4.600000 1.357519 9.057143 12.2000 6.40000 2.330849 NA NA NA NA 7
rayyan-33185922 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4
rayyan-33185928 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 121.65500 151.20000 92.11000 41.7829397 2
rayyan-33185981 NA NA NA NA NA NA NA NA 15.200000 15.200000 15.200000 NA NA NA NA NA NA NA NA NA 1
rayyan-33185991 78.68650 83.67400 72.53900 4.8649538 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4
rayyan-33186081 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 406.00000 406.00000 406.00000 NA 1
rayyan-33186083 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 183.64000 183.64000 183.64000 NA 1
rayyan-33186084 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 100.19000 100.19000 100.19000 NA 1
rayyan-33186095 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2
rayyan-33186100 NA NA NA NA NA NA NA NA 4.706667 7.100000 2.800000 1.275856 NA NA NA NA NA NA NA NA 15
rayyan-33186105 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 397.26315 418.87720 375.64910 30.5668826 2
rayyan-33186107 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
rayyan-33186108 336.50000 348.00000 328.00000 9.2209905 NA NA NA NA 61.800000 64.700000 58.700000 2.741046 NA NA NA NA NA NA NA NA 4
rayyan-33186111 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 156.00000 156.00000 156.00000 NA 1
rayyan-33186112 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
rayyan-33186116 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2
rayyan-33186120 54.70000 54.70000 54.70000 NA NA NA NA NA 9.080000 9.080000 9.080000 NA NA NA NA NA NA NA NA NA 1
rayyan-33186121 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 151.10000 173.30000 128.90000 31.3955411 2
rayyan-33186182 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 3
rayyan-33186189 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 72.57500 74.69000 70.46000 2.9910617 2
rayyan-33186200 65.68500 74.16000 57.21000 11.9854599 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2
rayyan-33186205 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 181.47000 181.47000 181.47000 NA 1
rayyan-33186206 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 122.38000 132.72000 114.48000 7.6032318 4
rayyan-33186208 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 108.11000 108.11000 108.11000 NA 1
rayyan-33186211 NA NA NA NA NA NA NA NA 5.357174 7.721201 3.158679 1.588524 NA NA NA NA NA NA NA NA 7
rayyan-33186212 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 181.56000 181.56000 181.56000 NA 1
rayyan-33186219 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 115.40000 115.40000 115.40000 NA 1
rayyan-33186224 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 141.05000 155.89000 126.21000 20.9869293 2
rayyan-33186225 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 228.60000 242.00000 215.20000 18.9504617 2
rayyan-33186227 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 302.20000 302.20000 302.20000 NA 1
rayyan-33186280 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 224.07333 270.06000 197.26000 40.0084058 3
rayyan-33186286 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 92.80000 93.00000 92.50000 0.2645751 3
rayyan-33186290 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 75.10000 75.10000 75.10000 NA 1
rayyan-33186293 16.22000 16.22000 16.22000 NA 41.0800 41.0800 41.0800 NA 2.560000 2.560000 2.560000 NA 3.420000 3.4200 3.42000 NA 307.24000 307.24000 307.24000 NA 1
rayyan-33186294 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 229.76000 229.76000 229.76000 NA 1
rayyan-33186297 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 621.20700 621.20700 621.20700 NA 1
rayyan-33186305 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 6940.00000 6940.00000 6940.00000 NA 1
rayyan-33186308 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 153.01667 170.20000 138.60000 13.9341906 6
rayyan-33186313 63.11977 78.63471 44.32959 8.2134960 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 75
rayyan-33186317 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 177.60000 177.60000 177.60000 NA 1
rayyan-33186325 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 272.16333 291.76000 259.73000 17.1745519 3
rayyan-33186327 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 391.48000 391.48000 391.48000 NA 1
rayyan-33186328 105.35875 118.32000 92.52000 11.2868209 NA NA NA NA 15.912500 21.010000 10.480000 4.356931 NA NA NA NA NA NA NA NA 8
rayyan-33186379 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 191.20000 191.20000 191.20000 NA 1
rayyan-33186384 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 97.53000 97.53000 97.53000 NA 1
rayyan-33186396 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 98.30000 98.30000 98.30000 NA 1
rayyan-33186404 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 0.01912 0.01912 0.01912 NA 1
rayyan-33186421 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 59.57516 89.84954 27.85492 21.5825675 10
rayyan-33186423 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 158.33750 198.17000 136.98000 28.4639332 4
rayyan-33186424 59.60000 59.60000 59.60000 NA 296.3000 296.3000 296.3000 NA 11.000000 11.000000 11.000000 NA 23.000000 23.0000 23.00000 NA NA NA NA NA 1
rayyan-33186479 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 154.50000 193.30000 120.40000 34.3531658 4
rayyan-33186484 NA NA NA NA 421.5600 421.5600 421.5600 NA NA NA NA NA 19.850000 19.8500 19.85000 NA NA NA NA NA 1
rayyan-33186485 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 362.50000 372.00000 353.00000 13.4350288 2
rayyan-33186492 83.80000 83.80000 83.80000 NA 439.0000 439.0000 439.0000 NA 16.800000 16.800000 16.800000 NA 40.360000 40.3600 40.36000 NA NA NA NA NA 1
rayyan-33186493 NA NA NA NA 228.7600 265.8100 191.7100 52.39661 NA NA NA NA 9.995000 11.1800 8.81000 1.675843 NA NA NA NA 2
rayyan-33186496 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 4
rayyan-33186506 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 341.87500 444.00000 257.00000 55.3003165 8
rayyan-33186518 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 178.28000 178.28000 178.28000 NA 1
rayyan-33186519 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
rayyan-33186584 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 115.40000 115.40000 115.40000 NA 1
rayyan-33186596 75.48500 80.65000 70.32000 7.3044130 NA NA NA NA 12.075000 13.100000 11.050000 1.449569 NA NA NA NA 186.56500 206.99000 166.14000 28.8853120 2
rayyan-33186620 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 131.45000 190.40000 71.54000 65.6046269 4
rayyan-33186621 71.90906 90.67934 48.57124 7.7024691 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 60
rayyan-33186625 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 243.08000 261.16000 225.46000 14.8105300 5
rayyan-33186628 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
rayyan-33186684 400.00000 400.00000 400.00000 NA 280.0000 280.0000 280.0000 NA NA NA NA NA 29.700000 29.7000 29.70000 NA NA NA NA NA 1
rayyan-33186685 57.40353 68.89622 48.26075 4.7742315 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 59
rayyan-33186690 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 122.28000 122.28000 122.28000 NA 1
rayyan-33186704 87.00000 87.00000 87.00000 NA 439.1000 439.1000 439.1000 NA 14.900000 14.900000 14.900000 NA 13.300000 13.3000 13.30000 NA NA NA NA NA 1
rayyan-33186789 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 266.69604 266.69604 266.69604 NA 1
rayyan-33186801 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 168.11500 218.19000 139.34000 22.2362668 10
rayyan-33186803 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 156.19608 156.19608 156.19608 NA 1
rayyan-33186823 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 129.50000 161.90000 109.40000 22.5983775 4
rayyan-33186891 85.87895 122.10000 50.80000 17.1464794 NA NA NA NA 14.915790 22.800000 8.200000 4.493732 NA NA NA NA NA NA NA NA 19
rayyan-33186984 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 191.40000 191.40000 191.40000 NA 1
rayyan-33186990 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 184.60000 197.30000 171.90000 17.9605122 2
rayyan-33187004 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 139.00000 139.00000 139.00000 NA 1
rayyan-33187005 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 179.73000 179.73000 179.73000 NA 1
rayyan-33187009 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 38.82000 38.82000 38.82000 NA 1
rayyan-33187015 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
rayyan-33187026 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 145.09000 150.71000 141.02000 3.4772288 6
rayyan-33187027 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 81.25000 81.25000 81.25000 NA 1
rayyan-33187079 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 190.25000 236.00000 160.00000 23.5356872 8
rayyan-33187080 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 41.52199 46.99074 34.02778 5.4298517 4
rayyan-33187087 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 324.00000 324.00000 324.00000 NA 1
rayyan-33187103 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 152.61454 155.20987 150.02469 2.0745338 9
rayyan-33187114 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 106.20000 110.80000 104.00000 3.1198291 4
rayyan-33187116 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 255.60667 274.04000 243.92000 16.1536910 3
rayyan-33187182 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 544.80000 754.00000 335.60000 295.8534772 2
rayyan-33187198 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 546.35333 765.03000 407.78000 191.6228030 3
rayyan-33187201 NA NA NA NA 750.3900 750.3900 750.3900 NA NA NA NA NA 48.450000 48.4500 48.45000 NA NA NA NA NA 1
rayyan-33187210 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 259.21500 276.27000 242.16000 24.1194123 2
rayyan-33187218 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 107.70000 107.70000 107.70000 NA 1
rayyan-33187284 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 212.47000 212.47000 212.47000 NA 1
rayyan-33187321 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 259.00000 259.00000 259.00000 NA 1
rayyan-33187323 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 195.15000 218.30000 172.00000 32.7390440 2
rayyan-33187326 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2
rayyan-33187379 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 396.00000 396.00000 396.00000 NA 1
rayyan-33187386 91.72000 92.20000 91.24000 0.6788225 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2
rayyan-33187393 NA NA NA NA NA NA NA NA NA NA NA NA 25.000000 25.0000 25.00000 NA NA NA NA NA 1
rayyan-33187394 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 253.20000 253.20000 253.20000 NA 1
rayyan-33187419 38.22502 47.19851 28.38398 9.7233924 NA NA NA NA 12.733002 17.837841 6.787559 4.823711 NA NA NA NA NA NA NA NA 4
rayyan-33187421 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 241.00000 241.00000 241.00000 NA 1
rayyan-33187502 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 2
rayyan-33187593 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 48.25000 48.25000 48.25000 NA 1
rayyan-33187601 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 36.15000 36.90000 35.40000 1.0606602 2
rayyan-33187610 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 104.47250 121.97000 95.04000 12.6076680 4
rayyan-33187617 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 118.70000 118.70000 118.70000 NA 1
rayyan-33187620 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 125.02000 125.02000 125.02000 NA 1
rayyan-33187626 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 62.15736 81.36067 52.50532 13.0214582 4
rayyan-33187693 94.70000 94.70000 94.70000 NA NA NA NA NA NA NA NA NA 85.000000 85.0000 85.00000 NA NA NA NA NA 1
rayyan-33187708 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 361.00000 363.20000 358.80000 3.1112698 2
rayyan-33187720 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 111.45000 113.30000 109.60000 2.6162951 2
rayyan-33187788 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 151.08000 151.08000 151.08000 NA 1
rayyan-33189096 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1
rayyan-33189101 68.38000 68.38000 68.38000 NA NA NA NA NA 11.350000 11.350000 11.350000 NA NA NA NA NA NA NA NA NA 1
rayyan-33189111 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 180.07500 187.81000 172.34000 10.9389419 2
rayyan-33189116 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 385.00000 385.00000 385.00000 NA 1
rayyan-33189128 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 116.36000 116.36000 116.36000 NA 1
rayyan-33189137 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 64.83871 64.83871 64.83871 NA 1
rayyan-33189143 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 62.97000 62.97000 62.97000 NA 1
rayyan-33189147 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 100.05833 105.50000 93.30000 3.1454031 12
rayyan-33189201 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 186.79000 186.79000 186.79000 NA 1
rayyan-33189260 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 61.17500 65.53000 57.42000 3.6778844 4
rayyan-33189310 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 60.00000 60.00000 60.00000 NA 1
rayyan-33189386 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 82.04000 82.04000 82.04000 NA 1
rayyan-33189505 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 141.13000 141.13000 141.13000 NA 1
rayyan-33189548 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 120.00000 120.00000 120.00000 NA 1
rayyan-33189575 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 112.00000 112.00000 112.00000 NA 1
rayyan-33189578 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 81.60000 81.60000 81.60000 NA 1
rayyan-33189584 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 86.36000 86.36000 86.36000 NA 1
rayyan-33189611 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 160.30000 160.30000 160.30000 NA 1
rayyan-33189616 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1537.70000 1537.70000 1537.70000 NA 1
rayyan-33189628 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 14.54500 14.70000 14.39000 0.2192031 2
rayyan-33189689 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 440.00000 460.00000 410.00000 26.4575131 3
rayyan-33189783 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 254.92000 254.92000 254.92000 NA 1
rayyan-33189793 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 174.47496 178.02908 170.92084 5.0262847 2
rayyan-33189812 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 129.90000 129.90000 129.90000 NA 1
rayyan-33189873 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 106.37000 106.37000 106.37000 NA 1
rayyan-33197791 89.64000 89.64000 89.64000 NA NA NA NA NA 8.580000 8.580000 8.580000 NA NA NA NA NA NA NA NA NA 1
rayyan-33197855 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 132.84000 132.84000 132.84000 NA 1
rayyan-33199533 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 130.74000 130.74000 130.74000 NA 1
rayyan-36980058 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 148.20000 182.30000 122.20000 30.8579001 3
rayyan-37034231 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 244.08333 294.14000 179.83000 58.4624070 3
rayyan-37034386 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 139.88000 139.88000 139.88000 NA 1
rayyan-37038224 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 355.00000 416.00000 294.00000 86.2670273 2

Exploratory histograms

plot_histogram(dat)

I conducted an analysis for each cell size trait. By examining frequency distribution histograms, I established distribution thresholds for these variables. These thresholds were set somewhat arbitrarily for each trait but allowed for manual inspection of any values below or above them. In cases where an error was detected, I referred back to the original source and rechecked the data thoroughly.

Check data of cell area via histogram

dat %>%
  filter(is.finite(cell_area)) %>%
  ggplot(aes(log10(cell_area))) + 
  geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
  theme_classic() + 
  labs(title = "Cell Area")

dat %>%   
  filter(is.finite(cell_area)) %>%
  mutate(log10_cell_area = log10(cell_area)) %>%  # Calculate log10(cell_area)
  filter(log10_cell_area < 1.3 | log10_cell_area > 2.9) %>%  # Apply thresholds, based on data distribution
  select(key, class, species_reported, log10_cell_area, cell_area) %>%  # Select relevant columns
  arrange(key, cell_area)
##                           key          class                species_reported
## 1 pdf_not_found_Gulliver_1875         Dipnoi Protopterus annectens annectens
## 2             rayyan-33186293 Actinopterygii         Iranocichla hormuzensis
##   log10_cell_area cell_area
## 1        2.975294    944.70
## 2        1.210051     16.22

Check data of cell volume via histogram

ggplot(dat %>% filter(is.finite(cell_volume)), aes(log10(cell_volume))) + 
  geom_histogram(fill = "firebrick", color = "black", binwidth = 0.05) +
  theme_classic() + 
  labs(title = "Cell Volume")

dat %>%
  filter(is.finite(cell_volume)) %>%
  mutate(log10_cell_volume = log10(cell_volume)) %>%  # Calculate log10(cell_volume)
  filter(log10_cell_volume < 2 | log10_cell_volume > 3) %>%  # Apply thresholds, based on data distribution
  select(key, class, species_reported, log10_cell_volume, cell_volume) %>%  # Select relevant columns
  arrange(key, cell_volume)
##                key          class           species_reported log10_cell_volume
## 1  rayyan-33185780 Actinopterygii      Pygocentrus nattereri          3.066334
## 2  rayyan-33185780 Actinopterygii      Pygocentrus nattereri          3.080239
## 3  rayyan-33185780 Actinopterygii      Pygocentrus nattereri          3.106438
## 4  rayyan-33185780 Actinopterygii      Pygocentrus nattereri          3.116222
## 5  rayyan-33185780 Actinopterygii     Synbranchus marmoratus          3.139796
## 6  rayyan-33185780 Actinopterygii      Pygocentrus nattereri          3.191964
## 7  rayyan-33185780 Actinopterygii     Synbranchus marmoratus          3.196839
## 8  rayyan-33185780 Actinopterygii     Synbranchus marmoratus          3.201626
## 9  rayyan-33185780 Actinopterygii     Synbranchus marmoratus          3.236037
## 10 rayyan-33185780 Actinopterygii     Synbranchus marmoratus          3.276326
## 11 rayyan-33185803 Actinopterygii          Crenilabrus tinca          1.738781
## 12 rayyan-33185803 Actinopterygii         Uranoscopus scaber          1.903090
## 13 rayyan-33185803 Actinopterygii Gaidropsarus mediterraneus          1.920645
## 14 rayyan-33186293 Actinopterygii    Iranocichla hormuzensis          1.613630
##    cell_volume
## 1     1165.023
## 2     1202.926
## 3     1277.725
## 4     1306.839
## 5     1379.736
## 6     1555.835
## 7     1573.400
## 8     1590.838
## 9     1722.017
## 10    1889.407
## 11      54.800
## 12      80.000
## 13      83.300
## 14      41.080

Check data of nucleus area via histogram

dat %>%
  filter(is.finite(nucleus_area)) %>%
  ggplot(aes(log10(nucleus_area))) + 
  geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
  theme_classic() + 
  labs(title = "Nucleus Area")

dat %>%
  filter(is.finite(nucleus_area)) %>%
  mutate(log10_nucleus_area = log10(nucleus_area)) %>%  # Calculate log10(nucleus_area)
  filter(log10_nucleus_area < 0.6 | log10_nucleus_area > 2) %>%  # Apply thresholds, based on data distribution
  select(key, class, species_reported, log10_nucleus_area, nucleus_area) %>%  # Select relevant columns
  arrange(key, nucleus_area)
##                            key          class                species_reported
## 1                    9_gregory Chondrichthyes             Oxynotus bruniensis
## 2                    9_gregory Chondrichthyes           Centroscymnus owstoni
## 3                    9_gregory Chondrichthyes        Centroscymnus coelolepis
## 4                    9_gregory Chondrichthyes           Etmopterus granulosus
## 5                    9_gregory Chondrichthyes              Squatina australis
## 6                    9_gregory Chondrichthyes           Etmopterus brachyurus
## 7                    9_gregory Chondrichthyes        Centroscymnus crepidater
## 8                    9_gregory Chondrichthyes          Centroscymnus plunketi
## 9  pdf_not_found_Gulliver_1875         Dipnoi Protopterus annectens annectens
## 10             rayyan-33186100 Actinopterygii               Upeneus maculatus
## 11             rayyan-33186100 Actinopterygii              Balistes capriscus
## 12             rayyan-33186100 Actinopterygii         Bathystoma aurilineatum
## 13             rayyan-33186100 Actinopterygii                 Calamus calamus
## 14             rayyan-33186211 Actinopterygii              Micropterus coosae
## 15             rayyan-33186211 Actinopterygii                Perca flavescens
## 16             rayyan-33186293 Actinopterygii         Iranocichla hormuzensis
##    log10_nucleus_area nucleus_area
## 1           2.0440692   110.680000
## 2           2.0936668   124.070000
## 3           2.1041114   127.090000
## 4           2.1285931   134.460000
## 5           2.1537844   142.490000
## 6           2.1612482   144.960000
## 7           2.1689981   147.570000
## 8           2.1968115   157.330000
## 9           2.0795068   120.090000
## 10          0.4471580     2.800000
## 11          0.5185139     3.300000
## 12          0.5314789     3.400000
## 13          0.5797836     3.800000
## 14          0.4995055     3.158679
## 15          0.5575878     3.610670
## 16          0.4082400     2.560000

Check data of nucleus volume via histogram

dat %>%
  filter(is.finite(nucleus_volume)) %>%
  ggplot(aes(log10(nucleus_volume))) + 
  geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
  theme_classic() + 
  labs(title = "Nucleus Volume")

dat %>%
  filter(is.finite(nucleus_volume)) %>%
  mutate(log10_nucleus_volume = log10(nucleus_volume)) %>%  # Calculate log10(nucleus_volume)
  filter(log10_nucleus_volume < 0.8 | log10_nucleus_volume > 2.5) %>%  # Apply thresholds, based on data distribution (p2)
  select(key, class, species_reported, log10_nucleus_volume, nucleus_volume) %>%  # Select relevant columns
  arrange(key, nucleus_volume)
##               key          class        species_reported log10_nucleus_volume
## 1 rayyan-33186293 Actinopterygii Iranocichla hormuzensis            0.5340261
##   nucleus_volume
## 1           3.42

Check data of cell length via histogram

dat %>%      
  filter(is.finite(cell_length)) %>%
  ggplot(aes(log10(cell_length))) + 
  geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
  theme_classic() + 
  labs(title = "Cell Length")

dat %>%
  filter(is.finite(cell_length)) %>%
  mutate(log10_cell_length = log10(cell_length)) %>%  # Calculate log10(cell_length)
  filter(log10_cell_length < 0.8 | log10_cell_length > 1.5) %>%  # Apply thresholds, based on data distribution
  select(key, location_description, class, species_reported, log10_cell_length, cell_length) %>%  # Select relevant columns
  arrange(key,location_description, cell_length)
##                           key                        location_description
## 1                   3_gregory                                        <NA>
## 2 pdf_not_found_Gulliver_1875                                        <NA>
## 3 pdf_not_found_Gulliver_1875                                        <NA>
## 4 pdf_not_found_Gulliver_1875                                        <NA>
## 5             rayyan-33186293                          Mehran river, Iran
## 6             rayyan-33186496     Libong Island, Trang Province, Thailand
## 7             rayyan-33186496 Rajamangala Beach, Trang Province, Thailand
##            class                species_reported log10_cell_length cell_length
## 1         Dipnoi              Ceratodus forsteri         1.5910646       39.00
## 2 Chondrichthyes               Oxynotus centrina         1.5024271       31.80
## 3 Chondrichthyes                 Torpedo torpedo         1.5024271       31.80
## 4         Dipnoi Protopterus annectens annectens         1.6493349       44.60
## 5 Actinopterygii         Iranocichla hormuzensis         0.7411516        5.51
## 6 Actinopterygii             Gerres filamentosus         0.7888751        6.15
## 7 Actinopterygii             Leiognathus decorus         0.7972675        6.27

Check data of cell width via histogram

dat %>% 
  filter(is.finite(cell_width)) %>%
  ggplot(aes(log10(cell_width))) + 
  geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
  theme_classic() + 
  labs(title = "Cell Width")

dat %>%
  filter(is.finite(cell_width)) %>%
  mutate(log10_cell_width = log10(cell_width)) %>%  # Calculate log10(cell_width)
  filter(log10_cell_width < 0.6 | log10_cell_width > 1.3) %>%  # Apply thresholds, based on data distribution
  select(key, class, species_reported, log10_cell_width, cell_width) %>%  # Select relevant columns
  arrange(key, cell_width)
##                           key          class                species_reported
## 1                   3_gregory         Dipnoi              Ceratodus forsteri
## 2 pdf_not_found_Gulliver_1875 Chondrichthyes               Oxynotus centrina
## 3 pdf_not_found_Gulliver_1875 Chondrichthyes                 Torpedo torpedo
## 4 pdf_not_found_Gulliver_1875         Dipnoi Protopterus annectens annectens
## 5    pdf_not_found_Kisch_1951 Chondrichthyes               Torpedo nobiliana
## 6             rayyan-33186293 Actinopterygii         Iranocichla hormuzensis
##   log10_cell_width cell_width
## 1        1.3802112      24.00
## 2        1.4048337      25.40
## 3        1.4048337      25.40
## 4        1.4313638      27.00
## 5        1.3654880      23.20
## 6        0.5728716       3.74

Check data of nucleus length via histogram

dat %>%      
  filter(is.finite(nucleus_length)) %>%
  ggplot(aes(log10(nucleus_length))) + 
  geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
  theme_classic() + 
  labs(title = "Nucleus Length")

dat %>%
  filter(is.finite(nucleus_length)) %>%
  mutate(log10_nucleus_length = log10(nucleus_length)) %>%  # Calculate log10(nucleus_length)
  filter(log10_nucleus_length < 0.5 | log10_nucleus_length > 1) %>%  # Apply thresholds, based on data distribution
  select(key, class, species_reported, log10_nucleus_length, nucleus_length) %>%  # Select relevant columns
  arrange(key, nucleus_length)
##                           key          class                species_reported
## 1                   3_gregory         Dipnoi              Ceratodus forsteri
## 2 pdf_not_found_Gulliver_1875         Dipnoi Protopterus annectens annectens
## 3             rayyan-33186293 Actinopterygii         Iranocichla hormuzensis
## 4             rayyan-33186496 Actinopterygii             Leiognathus decorus
##   log10_nucleus_length nucleus_length
## 1            1.1461280          14.00
## 2            1.2430380          17.50
## 3            0.3502480           2.24
## 4            0.4955443           3.13

Check data of nucleus width via histogram

dat %>%      
  filter(is.finite(nucleus_width)) %>%
  ggplot(aes(log10(nucleus_width))) + 
  geom_histogram(fill = "firebrick", col = "black", binwidth = 0.05) +
  theme_classic() + 
  labs(title = "Nucleus Width")

dat %>%
  filter(is.finite(nucleus_width)) %>%
  mutate(log10_nucleus_width = log10(nucleus_width)) %>%  # Calculate log10(nucleus_width)
  filter(log10_nucleus_width < 0.25 | log10_nucleus_width > 0.85) %>%  # Apply thresholds, based on data distribution
  select(key, class, species_reported, log10_nucleus_width, nucleus_width) %>%  # Select relevant columns
  arrange(key, nucleus_width)
##                           key          class                species_reported
## 1                   3_gregory         Dipnoi              Ceratodus forsteri
## 2 pdf_not_found_Gulliver_1875         Dipnoi Protopterus annectens annectens
## 3             rayyan-33186293 Actinopterygii         Iranocichla hormuzensis
##   log10_nucleus_width nucleus_width
## 1           0.9890046          9.75
## 2           0.9444827          8.80
## 3           0.1583625          1.44

Check data of Mean Corpuscular Volume (MCV) via histogram

dat %>%
  filter(is.finite(mcv)) %>%
  ggplot(aes(log10(mcv))) + 
  geom_histogram(fill = "firebrick", col = "black", binwidth = 0.1) +
  theme_classic() + 
  labs(title = "Mean Corpuscular Volume")

dat %>%
  filter(is.finite(mcv)) %>%
  mutate(log10_mcv = log10(mcv)) %>%  # Calculate log10(mcv)
  filter(log10_mcv < 1 | log10_mcv > 3) %>%  # Apply thresholds, based on data distribution
  select(key, class, species_reported, log10_mcv, mcv) %>%  # Select relevant columns
  arrange(key, mcv)
##               key          class        species_reported log10_mcv        mcv
## 1 rayyan-33186305         Dipnoi Protopterus aethiopicus  3.841359 6940.00000
## 2 rayyan-33186404 Actinopterygii     Heterotis niloticus -1.718512    0.01912
## 3 rayyan-33189616 Chondrichthyes   Scyliorhinus canicula  3.186872 1537.70000

For a particular study (rayyan-33186404), I observed that the decimal notation (full stops versus commas) was inconsistent. Consequently, I decided to calculate the MCV based on the haematocrit and erythrocyte count values using the formula described in the main text of our manuscript. Employing this formula yielded an approximate value of 187.1622 μm³. I also replaced the error of MCV by NA.

dat <- dat %>%
  mutate(
    mcv = ifelse(species_reported == "Heterotis niloticus" & key == "rayyan-33186404", 187.1622, mcv),
    mcv_error = ifelse(species_reported == "Heterotis niloticus" & key == "rayyan-33186404", NA, mcv_error)
  )

Upon making this adjustment, I observed that two distinct studies in the database reported the same MCV value. I noticed this because it was the similar group of authors, and the mean of body mass of the fish was identical. This corroborated the issues with the decimal places. So the correct value of MCV is 191.2.

dat %>%
  filter(species_reported == "Heterotis niloticus") %>%
  select(key, species_reported,body_mass_gram, mcv, mcv_error)
##               key    species_reported body_mass_gram      mcv mcv_error
## 1 rayyan-33186379 Heterotis niloticus          429.4 191.2000     13.74
## 2 rayyan-33186404 Heterotis niloticus          429.4 187.1622        NA

As a result of this, the study containing the error was excluded from the database. In Figure 1, this study will be labelled under “RBC expressed in wrong units”.

# Lets exclude that study
dat <- dat  %>%
  filter(key != "rayyan-33186404")

# Lets check again the data on MCV
dat %>%
  filter(is.finite(mcv)) %>%
  ggplot(aes(log10(mcv))) + 
  geom_histogram(fill = "firebrick", col = "black", binwidth = 0.1) +
  theme_classic() + 
  labs(title = "Mean Corpuscular Volume")

dat %>%
  filter(is.finite(mcv)) %>%
  mutate(log10_mcv = log10(mcv)) %>%  # Calculate log10(mcv)
  filter(log10_mcv < 1.2 | log10_mcv > 3) %>%  # Apply thresholds, based on data distribution
  select(key, class, species_reported, log10_mcv, mcv) %>%  # Select relevant columns
  arrange(key, mcv)
##               key          class        species_reported log10_mcv     mcv
## 1 rayyan-33186305         Dipnoi Protopterus aethiopicus  3.841359 6940.00
## 2 rayyan-33189616 Chondrichthyes   Scyliorhinus canicula  3.186872 1537.70
## 3 rayyan-33189628 Actinopterygii      Solea senegalensis  1.158061   14.39
## 4 rayyan-33189628 Actinopterygii      Solea senegalensis  1.167317   14.70

Calculation of Cell and Nuclear Dimensions

I also calculate the cell area, cell volume, nuclear area, and nuclear volume using available length and width data. This approach allows us to derive these parameters when direct measurements are unavailable, thereby enhancing the completeness of ErythroCite.

For this, I employed standard formulae to calculate the area and the volume of the cell or its nucleus, assuming that both the cell and its nucleus were shaped like ellipsoids or oblate spheroids (Benfey & Sutterlin, 1984; Gregory, 2024).

The formula for cellular area (A) is:

\[A = \pi \times \frac{a}{2} \times \frac{b}{2}\]

# Calculate cell area and volume if missing
dat$cell_area <- ifelse(is.na(dat$cell_area),
                                   pi * (dat$cell_length/2) * (dat$cell_width/2),
                                   dat$cell_area)

dat$cell_volume <- ifelse(is.na(dat$cell_volume),
                                     (4/3) * pi * (dat$cell_length/2) * (dat$cell_width/2)^2,
                                     dat$cell_volume)

The formula for cellular volume (V) is:

\[V = \frac{4}{3} \times \pi \times \frac{a}{2} \times \left(\frac{b}{2}\right)^2\]

Where ‘a’ and ‘b’ denote the lengths of the semi-major and semi-minor axes of an ellipse, respectively.

# Calculate nuclear area and volume if missing
dat$nucleus_area <- ifelse(is.na(dat$nucleus_area),
                                      pi * (dat$nucleus_length/2) * (dat$nucleus_width/2),
                                      dat$nucleus_area)

dat$nucleus_volume <- ifelse(is.na(dat$nucleus_volume),
                                        (4/3) * pi * (dat$nucleus_length/2) * (dat$nucleus_width/2)^2,
                                        dat$nucleus_volume)

Transformation of all errors to SDs, when is possible.

I will apply transformations to the error columns (cell_length_error, cell_width_error, cell_area_error, cell_volume_error, mcv_error, nucleus_length_error, nucleus_width_error, nucleus_area_error, and nucleus_volume_error) based on the estimate_error_type column:

unique(dat$estimate_error_type)
## [1] NA      "2SE"   "SD"    "SE"    "95_CI"

For these type of errors, the transformations are based on well-known fomuelae :

  1. SD (Standard Deviation): The value remains unchanged if the error type is “SD”.

  2. SE (Standard Error): The error is converted to SD using the formula:

\[ \text{SD} = \text{SE} \times \sqrt{N} \]

where \(N\) is the number of specimens used for the trait mean estimate.

  1. 2SE (Twice the Standard Error): The error is converted to SD using the formula:

\[ \text{SD} = \left(\frac{\text{2SE}}{2}\right) \times \sqrt{N} = \text{SE} \times \sqrt{N} \]

where \(N\) is the number of specimens used for the trait mean estimate.

  1. 95% CI (95% Confidence Interval): The error is converted to SD using the formula:

\[ \text{SD} = \left(\frac{\text{95% CI}}{1.96}\right) \times \sqrt{N} \]

where \(n\) is the number of specimens, and \(1.96\) corresponds to the Z-score for a 95% confidence interval.

# transform 'number_of_specimens' to numeric
dat$number_of_specimens <- as.numeric(as.character(dat$number_of_specimens))

# Apply transformation to all relevant error columns

dat <- dat %>%
  mutate(
    
    # Transforming cell_length_error to SD
    
    cell_length_sd = case_when(
      estimate_error_type == "SD" ~ cell_length_error,  # Keep SD values unchanged
      estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ cell_length_error * sqrt(number_of_specimens),  # Convert SE to SD
      estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_length_error / 2) * sqrt(number_of_specimens),  # Convert 2SE to SD
      estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_length_error / 1.96) * sqrt(number_of_specimens),  # Convert 95% CI to SD
      TRUE ~ NA_real_  # Assign NA if conversion is not possible
    ),
    # Transforming cell_width_error to SD
    cell_width_sd = case_when(
      estimate_error_type == "SD" ~ cell_width_error,  # Keep SD values unchanged
      estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ cell_width_error * sqrt(number_of_specimens),  # Convert SE to SD
      estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_width_error / 2) * sqrt(number_of_specimens),  # Convert 2SE to SD
      estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_width_error / 1.96) * sqrt(number_of_specimens),  # Convert 95% CI to SD
      TRUE ~ NA_real_  # Assign NA if conversion is not possible
    ),
    
    # Transforming cell_area_error to SD
    
    cell_area_sd = case_when(
      estimate_error_type == "SD" ~ cell_area_error,  # Keep SD values unchanged
      estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ cell_area_error * sqrt(number_of_specimens),  # Convert SE to SD
      estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_area_error / 2) * sqrt(number_of_specimens),  # Convert 2SE to SD
      estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_area_error / 1.96) * sqrt(number_of_specimens),  # Convert 95% CI to SD
      TRUE ~ NA_real_  # Assign NA if conversion is not possible
    ),
    
    # Transforming cell_volume_error to SD
    
    cell_volume_sd = case_when(
      estimate_error_type == "SD" ~ cell_volume_error,  # Keep SD values unchanged
      estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ cell_volume_error * sqrt(number_of_specimens),  # Convert SE to SD
      estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_volume_error / 2) * sqrt(number_of_specimens),  # Convert 2SE to SD
      estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (cell_volume_error / 1.96) * sqrt(number_of_specimens),  # Convert 95% CI to SD
      TRUE ~ NA_real_  # Assign NA if conversion is not possible
    ),
    
    # Transforming mcv_error to SD
    
    mcv_sd = case_when(
      estimate_error_type == "SD" ~ mcv_error,  # Keep SD values unchanged
      estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ mcv_error * sqrt(number_of_specimens),  # Convert SE to SD
      estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (mcv_error / 2) * sqrt(number_of_specimens),  # Convert 2SE to SD
      estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (mcv_error / 1.96) * sqrt(number_of_specimens),  # Convert 95% CI to SD
      TRUE ~ NA_real_  # Assign NA if conversion is not possible
    ),
    
    # Transforming nucleus_length_error to SD
    
    nucleus_length_sd = case_when(
      estimate_error_type == "SD" ~ nucleus_length_error,  # Keep SD values unchanged
      estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ nucleus_length_error * sqrt(number_of_specimens),  # Convert SE to SD
      estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_length_error / 2) * sqrt(number_of_specimens),  # Convert 2SE to SD
      estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_length_error / 1.96) * sqrt(number_of_specimens),  # Convert 95% CI to SD
      TRUE ~ NA_real_  # Assign NA if conversion is not possible
    ),
    
    # Transforming nucleus_width_error to SD
    
    nucleus_width_sd = case_when(
      estimate_error_type == "SD" ~ nucleus_width_error,  # Keep SD values unchanged
      estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ nucleus_width_error * sqrt(number_of_specimens),  # Convert SE to SD
      estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_width_error / 2) * sqrt(number_of_specimens),  # Convert 2SE to SD
      estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_width_error / 1.96) * sqrt(number_of_specimens),  # Convert 95% CI to SD
      TRUE ~ NA_real_  # Assign NA if conversion is not possible
    ),
    
    # Transforming nucleus_area_error to SD
    
    nucleus_area_sd = case_when(
      estimate_error_type == "SD" ~ nucleus_area_error,  # Keep SD values unchanged
      estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ nucleus_area_error * sqrt(number_of_specimens),  # Convert SE to SD
      estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_area_error / 2) * sqrt(number_of_specimens),  # Convert 2SE to SD
      estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_area_error / 1.96) * sqrt(number_of_specimens),  # Convert 95% CI to SD
      TRUE ~ NA_real_  # Assign NA if conversion is not possible
    ),
    
    # Transforming nucleus_volume_error to SD
    
    nucleus_volume_sd = case_when(
      estimate_error_type == "SD" ~ nucleus_volume_error,  # Keep SD values unchanged
      estimate_error_type == "SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ nucleus_volume_error * sqrt(number_of_specimens),  # Convert SE to SD
      estimate_error_type == "2SE" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_volume_error / 2) * sqrt(number_of_specimens),  # Convert 2SE to SD
      estimate_error_type == "95_CI" & !is.na(number_of_specimens) & number_of_specimens > 1 ~ (nucleus_volume_error / 1.96) * sqrt(number_of_specimens),  # Convert 95% CI to SD
      TRUE ~ NA_real_  # Assign NA if conversion is not possible
    )
  )

Extract names of countries using the geographical coordinates

# lines below take some time to run
dat <- dat %>%
  reverse_geocode(lat = lat_dec, 
                                long = long_dec, 
                                method = "osm") %>%
  mutate(country_collection = sub(".*,\\s*", "", address))  # Extract country name

Rename countries using names in English and create a column to assign sub-continent.

Many of the country names are in their native languages; therefore, I changed all the names to their official English versions.

dat <- dat %>%
  mutate(country_collection = case_when(
    country_collection == "Bermuda" ~ "United Kingdom",
    country_collection == "Bosna i Hercegovina / Босна и Херцеговина" ~ "Bosnia and Herzegovina",
    country_collection == "Brasil" ~ "Brazil",
    country_collection == "Česko" ~ "Czechia",
    country_collection == "Congo" ~ "Democratic Republic of the Congo",
    country_collection == "Deutschland" ~ "Germany",
    country_collection == "España" ~ "Spain",
    country_collection == "Italia" ~ "Italy",
    country_collection == "Mauritius / Maurice" ~ "Mauritius",
    country_collection == "México" ~ "Mexico",
    country_collection == "Perú" ~ "Peru",
    country_collection == "Norge" ~ "Norway",
    country_collection == "Polska" ~ "Poland",
    country_collection == "Türkiye" ~ "Turkey",
    country_collection == "Sverige" ~ "Sweden",
    country_collection == "United States" ~ "United States of America",
    country_collection == "Ελλάς" ~ "Greece",
    country_collection == "Россия" ~ "Russia",
    country_collection == "Україна" ~ "Ukraine",
    country_collection == "العراق" ~ "Iraq",
    country_collection == "پاکستان" ~ "Pakistan",
    country_collection == "ایران" ~ "Iran",
    country_collection == "سوريا" ~ "Syria",
    country_collection == "مصر" ~ "Egypt",
    country_collection == "عمان" ~ "Oman",
    country_collection == "ประเทศไทย" ~ "Thailand",
    country_collection == "বাংলাদেশ" ~ "Bangladesh",
    country_collection == "대한민국" ~ "South Korea",
    country_collection == "中国" ~ "China",
    country_collection == "日本" ~ "Japan",
    TRUE ~ country_collection
  )) %>%
  mutate(subcontinent = case_when(
    country_collection %in% c("Argentina", "Brazil", "Chile", "Colombia", "Ecuador", "Peru", "Venezuela") ~ "South America",  # South America
    country_collection %in% c("United States of America", "Mexico", "Bermuda") ~ "North America",  # North America
    country_collection %in% c("United Kingdom", "Germany", "France", "Italy", "Spain", "Portugal", "Norway", "Sweden") ~ "Western Europe",  # Western Europe
    country_collection %in% c("Russia", "Ukraine", "Poland", "Czechia", "Greece", "Bosnia and Herzegovina") ~ "Eastern Europe",  # Eastern Europe
    country_collection %in% c("Iran", "Iraq", "Turkey", "Oman", "Syria") ~ "Middle East",  # Middle East
    country_collection %in% c("India", "Pakistan", "Bangladesh") ~ "South Asia",  # South Asia
    country_collection %in% c("China", "Japan", "South Korea") ~ "East Asia",  # East Asia
    country_collection %in% c("Malaysia", "Thailand") ~ "Southeast Asia",  # Southeast Asia
    country_collection %in% c("Egypt", "Nigeria", "Mauritius", "Niger", "Democratic Republic of the Congo") ~ "Africa",  # Africa
    country_collection %in% c("Australia") ~ "Oceania",  # Oceania
    TRUE ~ country_collection  # Default for unclassified countries
  ))
# check again
unique(dat$country_collection)
##  [1] NA                                 "United Kingdom"                  
##  [3] "United States of America"         "India"                           
##  [5] "China"                            "Brazil"                          
##  [7] "Mexico"                           "Czechia"                         
##  [9] "Venezuela"                        "Japan"                           
## [11] "Chile"                            "Iran"                            
## [13] "Turkey"                           "Ukraine"                         
## [15] "Pakistan"                         "Nigeria"                         
## [17] "Egypt"                            "Russia"                          
## [19] "Argentina"                        "Australia"                       
## [21] "Poland"                           "Iraq"                            
## [23] "Democratic Republic of the Congo" "Bosnia and Herzegovina"          
## [25] "Greece"                           "Italy"                           
## [27] "Ecuador"                          "Mauritius"                       
## [29] "Thailand"                         "Syria"                           
## [31] "Malaysia"                         "Bangladesh"                      
## [33] "South Korea"                      "Peru"                            
## [35] "Niger"                            "Colombia"                        
## [37] "Sweden"                           "Norway"                          
## [39] "Germany"                          "Spain"                           
## [41] "Portugal"                         "Oman"
# select relevant columns
dat_cleaned <- dat %>%
  select(-double_checked, -source, taxo_level, -isMarine, -isBrackish, -isFresh, -address)

# reorder columns
names(dat_cleaned)
##  [1] "species_reported"     "database"             "key"                 
##  [4] "body_mass_gram"       "sex"                  "life_stage"          
##  [7] "lat_dec"              "long_dec"             "location_description"
## [10] "sample_size"          "number_of_specimens"  "estimate_error_type" 
## [13] "cell_length"          "cell_length_error"    "cell_width"          
## [16] "cell_width_error"     "cell_area"            "cell_area_error"     
## [19] "cell_volume"          "cell_volume_error"    "mcv"                 
## [22] "mcv_error"            "nucleus_length"       "nucleus_length_error"
## [25] "nucleus_width"        "nucleus_width_error"  "nucleus_area"        
## [28] "nucleus_area_error"   "nucleus_volume"       "nucleus_volume_error"
## [31] "notes"                "phylum"               "class"               
## [34] "order"                "family"               "genus"               
## [37] "species"              "taxo_level"           "realm"               
## [40] "species_underscored"  "cell_length_sd"       "cell_width_sd"       
## [43] "cell_area_sd"         "cell_volume_sd"       "mcv_sd"              
## [46] "nucleus_length_sd"    "nucleus_width_sd"     "nucleus_area_sd"     
## [49] "nucleus_volume_sd"    "country_collection"   "subcontinent"
dat_cleaned <- dat_cleaned %>%
  select(database, key, phylum, class, order, family, genus, species,species_reported, species_underscored,taxo_level,
         realm,
         lat_dec, long_dec, location_description, country_collection, subcontinent,
         everything()) %>%
  mutate(across(where(is.numeric), ~round(., digits = 4)))

Figure_2: Studies over years

# Panel A: Extracting and cleaning publication years:
df_years <- refs %>%
  as.data.frame() %>%
  select(year) %>%
  mutate(year = as.numeric(as.character(year))) %>%  # Ensure year is numeric
  filter(!is.na(year))  # Remove NA values

# Counting studies per year and calculating cumulative values:
studies_per_year <- df_years %>%
  group_by(year) %>%
  summarise(num_studies = n()) %>%
  arrange(year) %>%
  mutate(cumulative_studies = cumsum(num_studies))  # Calculate cumulative count

# Plotting the cumulative number of studies:
plot_years <- ggplot(studies_per_year, aes(x = year, y = cumulative_studies)) +
  geom_line(color = "#00AFBB", linewidth = 2) +
  scale_y_continuous(limits = c(0, 200)) +  # Limit the y-axis to 200
  scale_x_continuous(
    limits = c(1875, 2025),
    breaks = seq(1875, 2025, by = 25)  # Set x-axis intervals to 25 years
  ) +
  labs(
    x = "Publication Year",
    y = "Number of Studies"
  ) +
  theme_pubr() +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12)
  )
plot_years

# ------------------------------------------------------------------------------
# Panel B: Extracting and counting journals
df_journals <- refs %>%
  as.data.frame() %>%
  select(journal) %>%
  filter(!is.na(journal)) %>%  # Remove missing journal entries
  group_by(journal) %>%
  summarise(num_articles = n()) %>%
  arrange(desc(num_articles)) %>%
  slice_head(n = 15)  # Select the top 15 journals

# Plotting the 15 most common journals
plot_journals <- ggplot(df_journals, aes(x = reorder(journal, num_articles), y = num_articles)) +
  geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +  
  scale_y_continuous(limits = c(0, 20)) +  # Limit the y-axis to 20
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Journal",
    y = "Number of Studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 10)  # Larger font size for journal names
  )
plot_journals

# ------------------------------------------------------------------------------
# Combining Panel A and Panel B
Figure_2 <- plot_grid(
  plot_years, 
  plot_journals,
  labels = c("A", "B"),
  nrow = 2,
  ncol = 1,
  label_size = 15
)

# Saving the combined figure
ggsave('../manuscript/Figure_2.pdf', Figure_2, width = 7, height = 9)
ggsave('../manuscript/Figure_2.png', Figure_2, width = 7, height = 9, dpi = 1200)

Figure_3: Global map of studies

Assign color codes by realm

sort(unique(dat$realm))
## [1] "freshwater"                 "freshwater-brackish"       
## [3] "freshwater-brackish-marine" "marine"                    
## [5] "marine-brackish"
# [1] "freshwater"                 "freshwater-brackish"        "freshwater-brackish-marine"
# [4] "marine"                     "marine-brackish"  

fishualize("Hypleurochilus_fissicornis")

class_realm <- c("freshwater",
                 "freshwater-brackish",
                 "freshwater-brackish-marine",
                 "marine",
                 "marine-brackish")

# plot for the SEB meeting 2025
sort(unique(dat$class))
## [1] "Actinopterygii" "Chondrichthyes" "Cyclostomata"   "Dipnoi"
# [1] "Actinopterygii" "Chondrichthyes" "Cyclostomata"   "Dipnoi" 

class_type <- c("Actinopterygii",
                 "Chondrichthyes",
                 "Cyclostomata",
                 "Dipnoi")
# Lets use fishualize package to color the map (https://github.com/nschiett/fishualize)

fishualize("Scarus_quoyi")

Make plot of studies

filtered_data <- dat %>%
  filter(between(long_dec, -180, 180), between(lat_dec, -90, 90))

cell_size_studies <-ggplot(filtered_data, aes(x = long_dec, y = lat_dec)) +
  borders("world", colour = "black", fill = "white", size = 0.1) +
  theme_map() +
  geom_point(aes(fill = realm), size = 2, shape = 21, colour = "black", stroke = 0.2) +
  scale_fill_fish_d(option = "Scarus_quoyi", 
                    labels =  class_realm, name = "") +
  coord_quickmap(expand = FALSE) +
  theme(
    legend.position = c(0.03, 0.2),
    legend.justification = c(0, 0),
    legend.background = element_rect(fill = "transparent", color = NA),  # Remove legend frame
    legend.margin = margin(2, 2, 2, 2),  # Reduce legend margin
    legend.text = element_text(size = 8),  # Reduce legend text size
    legend.title = element_text(size = 10, face = "bold"),
    legend.key.size = unit(0.8, "lines"),  # Reduce legend symbol size
    panel.background = element_rect(fill = "white"),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 10),
    axis.title.y = element_text(angle = 90, vjust = 0.5)  # Rotate y-axis title
  ) +
  guides(fill = guide_legend(override.aes = list(size = 2))) +
  scale_x_continuous(name = "Longitude (degrees)", breaks = seq(-180, 180, 60)) +
  scale_y_continuous(name = "Latitude (degrees)", breaks = seq(-90, 90, 30))
cell_size_studies

Summarise the data by country, counting occurrences of each key (study) and plot

dat_summary <- dat %>%
  group_by(country_collection) %>%
  filter(!is.na(country_collection)) %>%
  summarise(study_count = length(unique(key)))  # Count occurrences of each reference per country

names(dat_summary)
## [1] "country_collection" "study_count"
# Obtain geospatial data for countries
world <- ne_download(scale = 110, type = "countries", category = "cultural", returnclass = "sf")
## Reading layer `ne_110m_admin_0_countries' from data source 
##   `/private/var/folders/rl/4qqy5shj5nldjmznhtd96p5m0000gp/T/RtmpjF31Ek/ne_110m_admin_0_countries.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 177 features and 168 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## Geodetic CRS:  WGS 84
# Merge the geospatial data with the study data
world_data <- world %>%
  left_join(dat_summary, by = c("SOVEREIGNT" = "country_collection"))

 # This plot was excluded after the first  round of revisons
country_studies <- ggplot(world_data) + 
  geom_sf(aes(fill = study_count), color = "black") +  
  theme_map() +
  scale_fill_gradient(low = "white", 
                      high = "gray20", 
                      na.value = "white", 
                      name = "Nº Studies", 
                      limits = c(0, max(world_data$study_count, na.rm = TRUE))) +  
  coord_sf(expand = FALSE) +  
  theme(
    legend.position = c(0.03, 0.2),
    legend.justification = c(0, 0),
    legend.background = element_rect(fill = "transparent", color = NA),
    legend.margin = margin(2, 2, 2, 2),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 10, face = "bold"),
    legend.key.size = unit(0.8, "lines"),
    panel.background = element_rect(fill = "white"),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 10),
    axis.title.y = element_text(angle = 90, vjust = 0.5)
  ) +
  scale_x_continuous(name = "Longitude (degrees)", 
                     breaks = seq(-180, 180, 60), 
                     labels = scales::number_format(accuracy = 1)) +  
  scale_y_continuous(name = "Latitude (degrees)", 
                     breaks = seq(-90, 90, 30), 
                     labels = scales::number_format(accuracy = 1)) +  
  guides(fill = guide_legend(override.aes = list(size = 2)))
# Counting Studies Per Subcontinent
subconti_by_studies <- dat %>%
  filter(!is.na(subcontinent)) %>% 
  group_by(subcontinent) %>%
  summarise(num_studies = n_distinct(key)) %>%
  arrange(desc(num_studies))

# Plotting Studies by subcontinent
plot_subcont <- ggplot(subconti_by_studies, aes(x = reorder(subcontinent, num_studies), y = num_studies)) +
  geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Subcontinent",
    y = "Number of Studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 10) 
  )
plot_subcont

Combine plots and export

# Para la version revisadad de nuestro articulo, una de las sugeerencias del revisor fue 
Figure_3 <- plot_grid(
  cell_size_studies, 
  # country_studies,
  plot_subcont,
  labels = c("A", "B"), 
  nrow = 2,  
  ncol = 1,  
  label_size = 15
)

# Store Plots
ggsave('../manuscript/Figure_3.pdf', Figure_3, width = 7, height = 10)
ggsave('../manuscript/Figure_3.png', Figure_3, width = 7, height = 10, dpi = 2000)

Figure_4: Phylogenetic tree with data

The following steps describe how to visualise the data associated with each trait and species within the phylogenetic tree. To accomplish this, we will first calculate mean values per species and trait, select the columns of interest, and standardise these values to enhance the colour contrast of the scale.

# Summaries data: calculate the mean value per species for each trait
summary_data <- dat %>%
  group_by(species_underscored) %>%
  summarise(across(c(cell_area, cell_volume, nucleus_area, nucleus_volume, mcv), 
                   ~ mean(., na.rm = TRUE)), 
            .groups = "drop") %>%
  rename(
    "Cell area" = cell_area,
    "Cell volume" = cell_volume,
    "Nucleus area" = nucleus_area,
    "Nucleus volume" = nucleus_volume,
    "MCV" = mcv
  )

# Scale min-max global by variable
summary_data <- summary_data %>%
  mutate(across(c("Cell area", "Cell volume", "Nucleus area", "Nucleus volume", "MCV"), 
                ~ (.-min(., na.rm = TRUE)) / (max(., na.rm = TRUE) - min(., na.rm = TRUE))))  # scale min-max global by variable
# Script section adapted by FPLeiva after RMolinaVenegas (Gracias Rafa)
summary_data
## # A tibble: 660 × 6
##    species_underscored `Cell area` `Cell volume` `Nucleus area` `Nucleus volume`
##    <chr>                     <dbl>         <dbl>          <dbl>            <dbl>
##  1 Abalistes_stellatus      0.0298      NaN             0.0310         NaN      
##  2 Abramis_brama            0.0465        0.0141      NaN              NaN      
##  3 Abudefduf_saxatilis      0.0415        0.0128        0.00995        NaN      
##  4 Abudefduf_septemfa…    NaN           NaN           NaN                0.00549
##  5 Abudefduf_sordidus       0.0738        0.0285        0.120            0.0561 
##  6 Abudefduf_taurus         0.0460        0.0149      NaN              NaN      
##  7 Abudefduf_vaigiens…      0.0769        0.0309        0.119            0.0538 
##  8 Acanthocybium_sola…      0.0618        0.0193      NaN              NaN      
##  9 Acanthogobius_hasta      0.0657        0.0230        0.110            0.0765 
## 10 Acanthopagrus_aust…    NaN           NaN           NaN                0.0123 
## # ℹ 650 more rows
## # ℹ 1 more variable: MCV <dbl>
unique(summary_data$species_underscored)
##   [1] "Abalistes_stellatus"             "Abramis_brama"                  
##   [3] "Abudefduf_saxatilis"             "Abudefduf_septemfasciatus"      
##   [5] "Abudefduf_sordidus"              "Abudefduf_taurus"               
##   [7] "Abudefduf_vaigiensis"            "Acanthocybium_solandri"         
##   [9] "Acanthogobius_hasta"             "Acanthopagrus_australis"        
##  [11] "Acanthopagrus_butcheri"          "Acanthostracion_polygonius"     
##  [13] "Acanthostracion_quadricornis"    "Acanthurus_bahianus"            
##  [15] "Acanthurus_chirurgus"            "Acanthurus_coeruleus"           
##  [17] "Acanthurus_gahhm"                "Acanthurus_grammoptilus"        
##  [19] "Acipenser_brevirostrum"          "Acipenser_oxyrinchus"           
##  [21] "Acipenser_sinensis"              "Acipenser_sturio"               
##  [23] "Aeoliscus_strigatus"             "Aetobatus_narinari"             
##  [25] "Albula_vulpes"                   "Aldrichetta_forsteri"           
##  [27] "Alosa_fallax"                    "Alphestes_afer"                 
##  [29] "Aluterus_schoepfii"              "Aluterus_scriptus"              
##  [31] "Ameiurus_catus"                  "Ammodytes_tobianus"             
##  [33] "Amphiprion_akindynos"            "Amphiprion_clarkii"             
##  [35] "Anabas_testudineus"              "Anguilla_bicolor"               
##  [37] "Anguilla_japonica"               "Anguilla_marmorata"             
##  [39] "Anguilla_rostrata"               "Aplodactylus_arctidens"         
##  [41] "Apogon_maculatus"                "Aprion_virescens"               
##  [43] "Aptychotrema_rostrata"           "Archosargus_rhomboidalis"       
##  [45] "Arothron_manilensis"             "Arripis_trutta"                 
##  [47] "Astyanax_lineatus"               "Astyanax_mexicanus"             
##  [49] "Atheresthes_evermanni"           "Atractosteus_tristoechus"       
##  [51] "Aulacocephalus_temminckii"       "Auxis_thazard"                  
##  [53] "Bairdiella_ronchus"              "Balistapus_undulatus"           
##  [55] "Balistes_capriscus"              "Balistes_carolinensis"          
##  [57] "Balistes_vetula"                 "Barbatula_barbatula"            
##  [59] "Barbatula_toni"                  "Basilichthys_australis"         
##  [61] "Bathygobius_soporator"           "Bathyraja_parmifera"            
##  [63] "Bathytoshia_centroura"           "Belone_belone"                  
##  [65] "Bentartia_pusillum"              "Betta_splendens"                
##  [67] "Boops_boops"                     "Boreogadus_saida"               
##  [69] "Bothus_lunatus"                  "Bovichtus_angustifrons"         
##  [71] "Brachygenys_chrysargyrea"        "Brevoortia_tyrannus"            
##  [73] "Brycon_hilarii"                  "Bujurquina_vittata"             
##  [75] "Caelorinchus_innotabilis"        "Caesio_cuning"                  
##  [77] "Calamus_bajonado"                "Calamus_calamus"                
##  [79] "Calamus_penna"                   "Calamus_pennatula"              
##  [81] "Callionymus_lyra"                "Cantherhines_pullus"            
##  [83] "Canthigaster_bennetti"           "Canthigaster_valentini"         
##  [85] "Carangoides_bartholomaei"        "Carangoides_ruber"              
##  [87] "Caranx_carangus"                 "Caranx_hippos"                  
##  [89] "Caranx_ignobilis"                "Caranx_latus"                   
##  [91] "Caranx_lugubris"                 "Caranx_sexfasciatus"            
##  [93] "Carassius_auratus"               "Carassius_carassius"            
##  [95] "Carassius_gibelio"               "Carcharhinus_brachyurus"        
##  [97] "Carcharhinus_falciformis"        "Carcharhinus_leucas"            
##  [99] "Carcharhinus_maculipinnis"       "Carcharhinus_melanopterus"      
## [101] "Carcharhinus_milberti"           "Carcharhinus_obscurus"          
## [103] "Carcharhinus_plumbeus"           "Catostomus_catostomus"          
## [105] "Catostomus_commersonii"          "Centriscops_humerosus"          
## [107] "Centropomus_undecimalis"         "Centropyge_bicolor"             
## [109] "Centroscymnus_coelolepis"        "Centroscymnus_crepidater"       
## [111] "Centroscymnus_owstoni"           "Cephalopholis_cruentata"        
## [113] "Cephalopholis_fulva"             "Cephalopholis_miniata"          
## [115] "Chaetodipterus_faber"            "Chaetodon_capistratus"          
## [117] "Chaetodon_lunulatus"             "Chaetodon_ocellatus"            
## [119] "Chaetodon_rainfordi"             "Chaetodon_sedentarius"          
## [121] "Chaetodon_striatus"              "Channa_argus"                   
## [123] "Channa_punctata"                 "Channa_striata"                 
## [125] "Cheilinus_trilobatus"            "Chelidonichthys_cuculus"        
## [127] "Chelidonichthys_lucerna"         "Chelon_ramada"                  
## [129] "Chilomycterus_spinosus"          "Chiloscyllium_punctatum"        
## [131] "Chirocentrus_dorab"              "Chloroscombrus_chrysurus"       
## [133] "Choerodon_albigena"              "Choerodon_cephalotes"           
## [135] "Choerodon_fasciatus"             "Chromis_analis"                 
## [137] "Chromis_viridis"                 "Chrosomus_neogaeus"             
## [139] "Chrysiptera_cyanea"              "Cichlasoma_dimerus"             
## [141] "Ciliata_mustela"                 "Cirrhinus_mrigala"              
## [143] "Cirrhinus_reba"                  "Clarias_batrachus"              
## [145] "Clarias_gariepinus"              "Clupea_harengus"                
## [147] "Clupeonella_cultriventris"       "Cobitis_biwae"                  
## [149] "Cobitis_striata"                 "Cobitis_taenia"                 
## [151] "Cobitis_takatsuensis"            "Coelorinchus_maurofasciatus"    
## [153] "Colossoma_macropomum"            "Conger_conger"                  
## [155] "Contusus_brevicaudus"            "Coregonus_clupeaformis"         
## [157] "Coregonus_maraena"               "Coreius_guichenoti"             
## [159] "Coris_batuensis"                 "Coryphaena_hippurus"            
## [161] "Coryphaenoides_serrulatus"       "Corythoichthys_intestinalis"    
## [163] "Cottus_gobio"                    "Crossosalarias_macrospilus"     
## [165] "Cryptacanthodes_maculatus"       "Cryptocentrus_leptocephalus"    
## [167] "Ctenopharyngodon_idella"         "Cyclopteropsis_jordani"         
## [169] "Cyclopterus_lumpus"              "Cyprinus_carpio"                
## [171] "Dactylopterus_volitans"          "Danio_rerio"                    
## [173] "Dascyllus_aruanus"               "Datnioides_polota"              
## [175] "Delminichthys_ghetaldii"         "Diagramma_labiosum"             
## [177] "Diagramma_picta"                 "Diapterus_rhombeus"             
## [179] "Diastobranchus_capensis"         "Dicentrarchus_labrax"           
## [181] "Diodon_holocanthus"              "Diplodus_argenteus"             
## [183] "Diplodus_vulgaris"               "Dipturus_batis"                 
## [185] "Dipturus_chilensis"              "Dipturus_laevis"                
## [187] "Diretmichthys_parini"            "Dischistodus_prosopotaenia"     
## [189] "Dissostichus_mawsoni"            "Dormitator_latifrons"           
## [191] "Drepane_punctata"                "Echeneis_naucrates"             
## [193] "Ecsenius_mandibularis"           "Ecsenius_yaeyamaensis"          
## [195] "Electrophorus_electricus"        "Ellochelon_vaigiensis"          
## [197] "Elopichthys_bambusa"             "Elops_saurus"                   
## [199] "Engraulis_anchoita"              "Engraulis_encrasicolus"         
## [201] "Epalzeorhynchos_bicolor"         "Epalzeorhynchos_frenatum"       
## [203] "Epinephelus_adscensionis"        "Epinephelus_cyanopodus"         
## [205] "Epinephelus_fasciatus"           "Epinephelus_guttatus"           
## [207] "Epinephelus_merra"               "Epinephelus_ongus"              
## [209] "Epinephelus_quoyans"             "Epinephelus_spilotoceps"        
## [211] "Epinephelus_striatus"            "Equetus_pulcher"                
## [213] "Esox_lucius"                     "Esox_niger"                     
## [215] "Etmopterus_brachyurus"           "Etmopterus_granulosus"          
## [217] "Eucinostomus_argenteus"          "Eucinostomus_gula"              
## [219] "Eugerres_plumieri"               "Eumicrotremus_spinosus"         
## [221] "Eupomacentrus_fuscus"            "Eupomacentrus_leucostictus"     
## [223] "Eupomacentrus_variabilis"        "Euristhmus_lepturus"            
## [225] "Euthynnus_alletteratus"          "Eutrigla_gurnardus"             
## [227] "Farlowella_acus"                 "Fistularia_petimba"             
## [229] "Fluvitrygon_signifer"            "Gadus_morhua"                   
## [231] "Gaidropsarus_ensis"              "Gaidropsarus_mediterraneus"     
## [233] "Galaxias_maculatus"              "Galaxias_olidus"                
## [235] "Galeocerdo_cuvier"               "Gambusia_holbrooki"             
## [237] "Gasterosteus_aculeatus"          "Geotria_australis"              
## [239] "Gerres_cinereus"                 "Gerres_filamentosus"            
## [241] "Gerres_subfasciatus"             "Ginglymostoma_cirratum"         
## [243] "Girella_elevata"                 "Girella_zebra"                  
## [245] "Glyptocephalus_cynoglossus"      "Glyptosternon_maculatum"        
## [247] "Gnathanodon_speciosus"           "Gobio_gobio"                    
## [249] "Gobiocypris_rarus"               "Gobiodon_citrinus"              
## [251] "Gobius_cobitis"                  "Gymnelus_viridis"               
## [253] "Gymnocephalus_cernua"            "Gymnocranius_audleyi"           
## [255] "Gymnocypris_eckloni"             "Gymnothorax_funebris"           
## [257] "Gymnothorax_pictus"              "Gymnothorax_vicinus"            
## [259] "Gymnotus_inaequilabiatus"        "Gyrinocheilus_aymonieri"        
## [261] "Haemulon_aurolineatum"           "Haemulon_flavolineatum"         
## [263] "Haemulon_plumierii"              "Haemulon_sciurus"               
## [265] "Halargyreus_johnsonii"           "Halichoeres_biocellatus"        
## [267] "Halichoeres_bivittatus"          "Halichoeres_garnoti"            
## [269] "Halichoeres_radiatus"            "Harengula_humeralis"            
## [271] "Helicolenus_barathri"            "Helicolenus_percoides"          
## [273] "Hemiglyphidodon_plagiometopon"   "Hemiramphus_brasiliensis"       
## [275] "Hemiscyllium_ocellatum"          "Hemitripterus_americanus"       
## [277] "Hemitrygon_bennettii"            "Heterodontus_francisci"         
## [279] "Heteropneustes_fossilis"         "Heterotis_niloticus"            
## [281] "Hippocampus_abdominalis"         "Hippoglossus_hippoglossus"      
## [283] "Hirundichthys_affinis"           "Holacanthus_bermudensis"        
## [285] "Holacanthus_ciliaris"            "Holacanthus_tricolor"           
## [287] "Holocentrus_ascensionis"         "Holocentrus_rufus"              
## [289] "Hoplias_malabaricus"             "Hoplisoma_metae"                
## [291] "Hoplisoma_paleatus"              "Hucho_hucho"                    
## [293] "Huso_huso"                       "Hypanus_americanus"             
## [295] "Hypophthalmichthys_molitrix"     "Hypophthalmichthys_nobilis"     
## [297] "Hypoplectrus_unicolor"           "Hyporhamphus_melanochir"        
## [299] "Hypostomus_boulengeri"           "Hypostomus_plecostomus"         
## [301] "Icelus_spatula"                  "Ictalurus_punctatus"            
## [303] "Idiacanthus_atlanticus"          "Iranocichla_hormuzensis"        
## [305] "Istigobius_rigilius"             "Isurus_oxyrinchus"              
## [307] "Jenynsia_lineata"                "Kathetostoma_canaster"          
## [309] "Katsuwonus_pelamis"              "Konosirus_punctatus"            
## [311] "Labeo_catla"                     "Labeo_chrysophekadion"          
## [313] "Labeo_rohita"                    "Labrisomus_nuchipinnis"         
## [315] "Lachnolaimus_maximus"            "Lactophrys_trigonus"            
## [317] "Lagocephalus_lunaris"            "Lamna_nasus"                    
## [319] "Lampetra_fluviatilis"            "Lampetra_planeri"               
## [321] "Lampris_regius"                  "Lates_calcarifer"               
## [323] "Lefua_echigonia"                 "Lefua_nikkonis"                 
## [325] "Leiopotherapon_unicolor"         "Lepidopsetta_bilineata"         
## [327] "Lepomis_macrochirus"             "Lethrinus_atkinsoni"            
## [329] "Lethrinus_miniatus"              "Lethrinus_nebulosus"            
## [331] "Lethrinus_rubrioperculatus"      "Leucaspius_delineatus"          
## [333] "Leuciscus_idus"                  "Leucoraja_erinaceus"            
## [335] "Leucoraja_ocellata"              "Limanda_aspera"                 
## [337] "Limanda_limanda"                 "Liparis_tunicatus"              
## [339] "Lipophrys_pholis"                "Lophius_americanus"             
## [341] "Lophius_piscatorius"             "Lutjanus_adetii"                
## [343] "Lutjanus_analis"                 "Lutjanus_apodus"                
## [345] "Lutjanus_carponotatus"           "Lutjanus_cyanopterus"           
## [347] "Lutjanus_fulviflamma"            "Lutjanus_griseus"               
## [349] "Lutjanus_lutjanus"               "Lutjanus_russellii"             
## [351] "Lutjanus_sebae"                  "Lutjanus_synagris"              
## [353] "Lutjanus_vitta"                  "Lutjanus_vivanus"               
## [355] "Lycodichthys_dearborni"          "Macropodus_opercularis"         
## [357] "Macrourus_berglax"               "Makaira_nigricans"              
## [359] "Megaleporinus_macrocephalus"     "Megalobrama_amblycephala"       
## [361] "Megalops_cyprinoides"            "Melanogrammus_aeglefinus"       
## [363] "Merlangius_merlangus"            "Merluccius_bilinearis"          
## [365] "Merluccius_hubbsi"               "Merluccius_merluccius"          
## [367] "Mesogobius_batrachocephalus"     "Mesovagus_antipodum"            
## [369] "Metynnis_hypsauchen"             "Metynnis_maculatus"             
## [371] "Micropogonias_furnieri"          "Micropterus_coosae"             
## [373] "Micropterus_salmoides"           "Microspathodon_chrysurus"       
## [375] "Misgurnus_anguillicaudatus"      "Mola_mola"                      
## [377] "Monopterus_albus"                "Morone_americana"               
## [379] "Morone_saxatilis"                "Mugil_cephalus"                 
## [381] "Mugil_curema"                    "Mugil_liza"                     
## [383] "Mulloidichthys_martinicus"       "Mulloidichthys_vanicolensis"    
## [385] "Mullus_barbatus"                 "Mullus_surmuletus"              
## [387] "Muraenesox_cinereus"             "Mustelus_canis"                 
## [389] "Myoxocephalus_octodecemspinosus" "Myoxocephalus_quadricornis"     
## [391] "Myoxocephalus_scorpius"          "Myripristis_jacobus"            
## [393] "Mystus_vittatus"                 "Myxine_glutinosa"               
## [395] "Myxocyprinus_asiaticus"          "Myxus_elongatus"                
## [397] "Myzopsetta_ferruginea"           "Nebrius_ferrugineus"            
## [399] "Nectamia_savayensis"             "Negaprion_brevirostris"         
## [401] "Nematalosa_come"                 "Neoceratodus_forsteri"          
## [403] "Neocyttus_rhomboidalis"          "Neogobius_melanostomus"         
## [405] "Neoscopelus_macrolepidotus"      "Neotrygon_kuhlii"               
## [407] "Niwaella_delicata"               "Notolabrus_tetricus"            
## [409] "Notopterus_notopterus"           "Novaculichthys_taeniourus"      
## [411] "Nuchequula_decora"               "Ocyurus_chrysurus"              
## [413] "Odontesthes_argentinensis"       "Ogcocephalus_vespertilio"       
## [415] "Oligoplites_saurus"              "Oncorhynchus_keta"              
## [417] "Oncorhynchus_kisutch"            "Oncorhynchus_mykiss"            
## [419] "Oncorhynchus_tshawytscha"        "Ophichthus_cephalozona"         
## [421] "Oplopomus_oplopomus"             "Opsanus_tau"                    
## [423] "Orectolobus_ornatus"             "Oreochromis_mossambicus"        
## [425] "Oreochromis_niloticus"           "Osmerus_eperlanus"              
## [427] "Osmerus_mordax"                  "Ostorhinchus_cookii"            
## [429] "Ostorhinchus_endekataenia"       "Ostorhinchus_guamensis"         
## [431] "Oxynotus_bruniensis"             "Oxynotus_centrina"              
## [433] "Pachypanchax_playfairii"         "Pagellus_bogaraveo"             
## [435] "Pagellus_erythrinus"             "Pagrus_auratus"                 
## [437] "Pangasianodon_hypophthalmus"     "Parachanna_obscura"             
## [439] "Paragobiodon_xanthosoma"         "Paralichthys_lethostigma"       
## [441] "Paralichthys_olivaceus"          "Paramugil_georgii"              
## [443] "Parapercis_cylindrica"           "Parapercis_hexophtalma"         
## [445] "Parupeneus_forsskali"            "Pelates_quadrilineatus"         
## [447] "Pempheris_schomburgkii"          "Perca_flavescens"               
## [449] "Perca_fluviatilis"               "Petromyzon_marinus"             
## [451] "Petroscirtes_fallax"             "Petroscirtes_lupus"             
## [453] "Petroscirtes_mitratus"           "Phoxinus_phoxinus"              
## [455] "Piaractus_brachypomus"           "Piaractus_mesopotamicus"        
## [457] "Pimelodella_gracilis"            "Plagioscion_squamosissimus"     
## [459] "Plagiotremus_rhinorhynchos"      "Planiliza_macrolepis"           
## [461] "Platichthys_flesus"              "Platybelone_argala"             
## [463] "Platycephalus_bassensis"         "Platycephalus_indicus"          
## [465] "Plectropomus_leopardus"          "Pleuronectes_platessa"          
## [467] "Podothecus_accipenserinus"       "Poecilia_mexicana"              
## [469] "Poecilia_reticulata"             "Pollachius_virens"              
## [471] "Polypterus_palmas"               "Polypterus_senegalus"           
## [473] "Pomacanthus_arcuatus"            "Pomacanthus_paru"               
## [475] "Pomacentrus_nagasakiensis"       "Pomadasys_kaakan"               
## [477] "Porichthys_porosissimus"         "Premnas_biaculeatus"            
## [479] "Priacanthus_tayenus"             "Prionace_glauca"                
## [481] "Prionotus_carolinus"             "Prionotus_evolans"              
## [483] "Pristiapogon_kallopterus"        "Prochilodus_lineatus"           
## [485] "Prognathodes_aculeatus"          "Proscymnodon_plunketi"          
## [487] "Prosopium_cylindraceum"          "Protopterus_aethiopicus"        
## [489] "Protopterus_annectens"           "Psalidodon_anisitsi"            
## [491] "Psettodes_erumei"                "Pseudaphritis_urvillii"         
## [493] "Pseudocaranx_dentex"             "Pseudomonacanthus_peroni"       
## [495] "Pseudoplatystoma_corruscans"     "Pseudopleuronectes_americanus"  
## [497] "Pseudorhombus_jenynsii"          "Pseudupeneus_maculatus"         
## [499] "Pterois_volitans"                "Pterophyllum_scalare"           
## [501] "Pterygoplichthys_pardalis"       "Pungitius_pungitius"            
## [503] "Pygocentrus_nattereri"           "Rachycentron_canadum"           
## [505] "Raja_clavata"                    "Raja_montagui"                  
## [507] "Rastrelliger_kanagurta"          "Repomucenus_limiceps"           
## [509] "Rhamdia_quelen"                  "Rhamphichthys_rostratus"        
## [511] "Rhinesomus_triqueter"            "Rhizoprionodon_terraenovae"     
## [513] "Rhynchocypris_lagowskii"         "Rita_rita"                      
## [515] "Rostroraja_eglanteria"           "Rutilus_kutum"                  
## [517] "Rutilus_rutilus"                 "Rypticus_saponaceus"            
## [519] "Salminus_affinis"                "Salmo_caspius"                  
## [521] "Salmo_salar"                     "Salmo_trutta"                   
## [523] "Salvelinus_alpinus"              "Salvelinus_fontinalis"          
## [525] "Salvelinus_namaycush"            "Salvelinus_umbla"               
## [527] "Sander_vitreus"                  "Sarda_australis"                
## [529] "Sardina_pilchardus"              "Sardinella_gibbosa"             
## [531] "Sarpa_salpa"                     "Scardinius_erythrophthalmus"    
## [533] "Scarus_coeruleus"                "Scarus_croicensis"              
## [535] "Scarus_ghobban"                  "Scarus_guacamaia"               
## [537] "Scarus_schlegeli"                "Scarus_taeniopterus"            
## [539] "Schizopyge_niger"                "Schizothorax_plagiostomus"      
## [541] "Schizothorax_prenanti"           "Scleropages_jardinii"           
## [543] "Scolopsis_monogramma"            "Scolopsis_vosmeri"              
## [545] "Scomber_scombrus"                "Scomberomorus_regalis"          
## [547] "Scophthalmus_maximus"            "Scophthalmus_rhombus"           
## [549] "Scorpaena_cardinalis"            "Scorpaena_plumieri"             
## [551] "Scorpaena_porcus"                "Scorpaenopsis_oxycephalus"      
## [553] "Scorpis_aequipinnis"             "Scyliorhinus_canicula"          
## [555] "Scyliorhinus_stellaris"          "Sebastes_alutus"                
## [557] "Sebastes_marinus"                "Sebastes_ocutalus"              
## [559] "Sebastes_polyspinis"             "Sebastes_schlegelii"            
## [561] "Selene_vomer"                    "Selenotoca_multifasciata"       
## [563] "Semotilus_corporalis"            "Seriola_hippos"                 
## [565] "Seriola_lalandi"                 "Seriola_quinqueradiata"         
## [567] "Serrasalmus_eigenmanni"          "Siganus_doliatus"               
## [569] "Siganus_fuscescens"              "Siganus_lineatus"               
## [571] "Siganus_spinus"                  "Siganus_sutor"                  
## [573] "Signigobius_biocellatus"         "Sillaginodes_punctatus"         
## [575] "Sillago_analis"                  "Silurus_asotus"                 
## [577] "Siniperca_chuatsi"               "Solea_senegalensis"             
## [579] "Solea_solea"                     "Soleichthys_heterorhinos"       
## [581] "Sorubim_cuspicaudus"             "Sorubim_lima"                   
## [583] "Sparisoma_aurofrenatum"          "Sparisoma_chrysopterum"         
## [585] "Sparisoma_radians"               "Sparisoma_viride"               
## [587] "Sparus_aurata"                   "Sphoeroides_greeleyi"           
## [589] "Sphoeroides_maculatus"           "Sphoeroides_spengleri"          
## [591] "Sphoeroides_testudineus"         "Sphyraena_barracuda"            
## [593] "Sphyraena_obtusata"              "Sphyrna_lewini"                 
## [595] "Sphyrna_mokarran"                "Sphyrna_tiburo"                 
## [597] "Sphyrna_tudes"                   "Sphyrna_zygaena"                
## [599] "Spicara_maena"                   "Sprattus_sprattus"              
## [601] "Squalius_cephalus"               "Squalus_acanthias"              
## [603] "Squatina_australis"              "Squatina_squatina"              
## [605] "Stenotomus_chrysops"             "Stephanolepis_hispida"          
## [607] "Sufflamen_fraenatus"             "Symphodus_tinca"                
## [609] "Synbranchus_marmoratus"          "Syngnathus_fuscus"              
## [611] "Syngnathus_scovelli"             "Syngnathus_typhle"              
## [613] "Synodontis_notatus"              "Synodus_intermedius"            
## [615] "Synodus_sageneus"                "Tachysurus_fulvidraco"          
## [617] "Taurulus_bubalis"                "Tautoga_onitis"                 
## [619] "Terapon_jarbua"                  "Terapon_puta"                   
## [621] "Tetrabrachium_ocellatum"         "Tetraodon_nigroviridis"         
## [623] "Tetronarce_nobiliana"            "Thalassoma_bifasciatum"         
## [625] "Thalassoma_klunzingeri"          "Thalassoma_lucasanum"           
## [627] "Thalassoma_lunare"               "Thunnus_alalunga"               
## [629] "Thunnus_albacares"               "Thunnus_atlanticus"             
## [631] "Thymallus_arcticus"              "Thymallus_thymallus"            
## [633] "Tinca_tinca"                     "Torpedo_torpedo"                
## [635] "Trachinocephalus"                "Trachinotus_botla"              
## [637] "Trachinotus_coppingeri"          "Trachinotus_falcatus"           
## [639] "Trachinus_draco"                 "Trachurus_trachurus"            
## [641] "Trachystoma_petardi"             "Trematomus_bernacchii"          
## [643] "Tripodichthys_angustifrons"      "Trisopterus_luscus"             
## [645] "Turrum_fulvoguttatum"            "Turrum_gymnostethus"            
## [647] "Tylosurus_gavialoides"           "Ucla_xenogrammus"               
## [649] "Ulua_aurochs"                    "Umbrina_coroides"               
## [651] "Upeneichthys_lineatus"           "Uranoscopus_scaber"             
## [653] "Urophycis_tenuis"                "Valenciennea_longipinnis"       
## [655] "Xiphias_gladius"                 "Xiphophorus_hellerii"           
## [657] "Zebrasoma_scopas"                "Zenopsis_nebulosus"             
## [659] "Zeus_faber"                      "Zoarces_americanus"
# Identify species to drop
species_to_drop <- setdiff(tree$tip.label, summary_data$species_underscored)

# Prune the tree
tree <- drop.tip(tree, species_to_drop)

# Align data and tree
datF <- summary_data %>%
  column_to_rownames("species_underscored")

dat_tree <- datF %>%
  filter(row.names(.) %in% tree$tip.label)

# Add missing species
missing_species <- setdiff(tree$tip.label, row.names(dat_tree))
dat_tree_NA <- data.frame(matrix(NA, nrow = length(missing_species), ncol = ncol(datF)))
row.names(dat_tree_NA) <- missing_species
colnames(dat_tree_NA) <- colnames(datF)
Data <- rbind(dat_tree, dat_tree_NA)
Data <- Data[match(tree$tip.label, row.names(Data)), ]
# Plot with species without names
circ <- ggtree(tree, layout = "fan", open.angle = 15, branch.length = "none")
circ <- rotate_tree(circ, 90)
circ

# Create a new plot with heatmap for each trait using a single scale
tree_data <- gheatmap(
  circ, 
  Data, 
  width = 0.2, 
  offset = 0,  # Offset for placing the heatmap
  colnames_offset_x = 0, 
  colnames_offset_y = 0, 
  font.size = 0,
  hjust = 0
)
tree_data

# Apply the same scale for all traits
tree_data <- tree_data +
  scale_fill_viridis_c(option = "H", name = "Normalised Cell Traits", na.value = "white") +
  theme(
    legend.position = c(0.58, 0.5),
    legend.title.position = "top",  
    legend.title.align = 0.5,  
    legend.direction = "horizontal",  
    legend.background = element_rect(fill = "transparent", color = NA),
    legend.key = element_rect(fill = "transparent", color = NA),
    legend.box.background = element_rect(fill = "transparent", color = NA),
    panel.background = element_rect(fill = "transparent", color = NA),    # Panel transparente
    plot.background = element_rect(fill = "transparent", color = NA),     # Plot transparente
    panel.grid.major = element_blank(),                                   # Sin grid
    panel.grid.minor = element_blank()
  )

tree_data

ggsave("../manuscript/Figure_4.png", tree_data, width = 12, height = 12,)
ggsave("../manuscript/Figure_4.pdf", tree_data, width = 12, height = 12)
# Plot with species names
circ_names <- ggtree(tree, layout = "fan", open.angle = 15) + 
  geom_tiplab(offset = 0.22, hjust = 0, size = 0.8)
circ_names <- rotate_tree(circ_names, 90)
# Create a new plot with heatmap for each trait using a single scale
tree_data_names <- gheatmap(
  circ_names, 
  Data, 
  width = 0.2, 
  offset = 0,  # Offset for placing the heatmap
  colnames_offset_x = 0, 
  colnames_offset_y = 0, 
  font.size = 4, 
  hjust = 0
)
tree_data_names

# Apply the same scale for all traits
tree_data_names <- tree_data_names +
  scale_fill_viridis_c(option = "H", name = "Normalised Cell Traits", na.value = "grey90") +
  theme(
    legend.position = c(0.6, 0.5),
    legend.title.position = "top",
    legend.title.align = 0.5,
    legend.direction = "horizontal",
    legend.key = element_blank(),
    legend.background=element_blank(),
    legend.key.width = unit(1, "cm"),
    legend.key.height = unit(0.7, "cm")
  )

tree_data_names

Figure_5: Most common species in the database

# Counting Studies Per Species
species_by_studies <- dat %>%
  group_by(species) %>%
  summarise(num_studies = n_distinct(key)) %>%
  arrange(desc(num_studies)) %>%
  slice_head(n = 15)  # Limit to top 15 species based on the number of studies

plot_studies <- ggplot(species_by_studies, aes(x = reorder(species, num_studies), y = num_studies)) +
  geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Species",
    y = "Number of Studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 10, face = "italic")  # Italicise species names
  )
plot_studies

# ------------------------------------------------------------------------------
# Counting Records per Species
species_by_records <- dat %>%
  group_by(species) %>%
  summarise(num_records = n()) %>%
  arrange(desc(num_records)) %>%
  slice_head(n = 15)  # Limit to top 15 species based on the number of records

plot_records <- ggplot(species_by_records, aes(x = reorder(species, num_records), y = num_records)) +
  geom_bar(stat = "identity", fill = "#009E73", width = 0.7) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Species",
    y = "Number of Records"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 10, face = "italic")  # Italicise species names
  )
plot_records

# ------------------------------------------------------------------------------
# Combine Plots
Figure_5 <- plot_grid(
  plot_studies, 
  plot_records,
  labels = c("A", "B"),
  nrow = 2,
  ncol = 1,
  label_size = 15,
  align = "hv"
)

# Store Plots
ggsave('../manuscript/Figure_5.pdf', Figure_5, width = 7, height = 9)
ggsave('../manuscript/Figure_5.png', Figure_5, width = 7, height = 9, dpi = 1200)

Figure_6: Number of species and percentage by Order

#
n_spp <- dat %>%
  distinct(species) %>%
  nrow()
n_spp
## [1] 660
# cousnting the most m¡common species
dat %>%
  group_by(species) %>%
  summarise(num_studies = n_distinct(key)) %>%
  arrange(desc(num_studies)) %>%
  slice_head(n = 15)  # Limit to top 15 species based on the number of studies
## # A tibble: 15 × 2
##    species                 num_studies
##    <chr>                         <int>
##  1 Oncorhynchus mykiss              17
##  2 Cyprinus carpio                  12
##  3 Labeo rohita                     10
##  4 Oreochromis niloticus             8
##  5 Ctenopharyngodon idella           7
##  6 Carassius auratus                 6
##  7 Channa punctata                   6
##  8 Clarias gariepinus                6
##  9 Salmo trutta                      6
## 10 Clarias batrachus                 5
## 11 Lophius piscatorius               5
## 12 Salmo salar                       5
## 13 Tinca tinca                       5
## 14 Dicentrarchus labrax              4
## 15 Echeneis naucrates                4
# ------------------------------------------------------------------------------
# Counting records by species
dat %>%
  group_by(species) %>%
  summarise(num_records = n()) %>%
  arrange(desc(num_records)) %>%
  slice_head(n = 15)  # Limit to top 15 species based on the number of records
## # A tibble: 15 × 2
##    species                     num_records
##    <chr>                             <int>
##  1 Ctenopharyngodon idella              82
##  2 Oreochromis niloticus                71
##  3 Hypophthalmichthys molitrix          63
##  4 Labeo rohita                         29
##  5 Oncorhynchus mykiss                  24
##  6 Cyprinus carpio                      18
##  7 Dicentrarchus labrax                 17
##  8 Channa punctata                      15
##  9 Salmo salar                          13
## 10 Clarias batrachus                    12
## 11 Abudefduf saxatilis                  11
## 12 Haemulon aurolineatum                11
## 13 Haemulon flavolineatum               11
## 14 Holocentrus ascensionis              11
## 15 Lutjanus griseus                     11
# ------------------------------------------------------------------------------
# Number of species and percentage by Order
spp_order <- dat %>%
  group_by(class, order) %>%
  reframe(count_species_by_order = length(unique(species)), 
          percent_species_by_order = round(count_species_by_order/n_spp * 100, 2)) %>%
  ungroup() %>%
  arrange(desc(count_species_by_order)) %>%
slice_head(n = 15)  # Limit to top 15 species based on the number of studies
spp_order
## # A tibble: 15 × 4
##    class          order            count_species_by_order percent_species_by_o…¹
##    <chr>          <chr>                             <int>                  <dbl>
##  1 Actinopterygii Perciformes                         115                  17.4 
##  2 Actinopterygii Cypriniformes                        51                   7.73
##  3 Actinopterygii Tetraodontiform…                     29                   4.39
##  4 Actinopterygii Labriformes                          27                   4.09
##  5 Actinopterygii Carangiformes                        24                   3.64
##  6 Actinopterygii Siluriformes                         24                   3.64
##  7 Actinopterygii Spariformes                          23                   3.48
##  8 Actinopterygii Pleuronectiform…                     20                   3.03
##  9 Chondrichthyes Carcharhiniform…                     20                   3.03
## 10 Actinopterygii Gadiformes                           19                   2.88
## 11 Actinopterygii Salmoniformes                        17                   2.58
## 12 Actinopterygii Syngnathiformes                      17                   2.58
## 13 Actinopterygii Characiformes                        15                   2.27
## 14 Actinopterygii Lutjaniformes                        15                   2.27
## 15 Actinopterygii Clupeiformes                         13                   1.97
## # ℹ abbreviated name: ¹​percent_species_by_order
# ------------------------------------------------------------------------------
plot_orders_numb <- ggplot(spp_order, aes(x = reorder(order, count_species_by_order), y = count_species_by_order)) +
  geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Order",
    y = "Number of Species"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 14),
    axis.title.y = element_text(face = "bold", size = 14))
plot_orders_numb

# ------------------------------------------------------------------------------
plot_orders_perc <- ggplot(spp_order, aes(x = reorder(order, percent_species_by_order), y = percent_species_by_order)) +
  geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Order",
    y = "Percentage of representation in ErythroCite"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 14),
    axis.title.y = element_text(face = "bold", size = 14))
plot_orders_perc

# exporrt figure
Figure_6 <- plot_grid(
  plot_orders_numb, 
  plot_orders_perc,
  labels = c("A", "B"),
  nrow = 2,
  ncol = 1,
  label_size = 15
)

# Store Plots
ggsave('../manuscript/Figure_6.pdf', Figure_6, width = 7, height = 9)
ggsave('../manuscript/Figure_6.png', Figure_6, width = 7, height = 9, dpi = 1200)

Figure_7: Studies and metadata

# Counting Studies Per Life Stage
life_stage_by_studies <- dat %>%
  mutate(life_stage = ifelse(is.na(life_stage), "not reported", life_stage)) %>%
  group_by(life_stage) %>%
  summarise(num_studies = n_distinct(key)) %>%
  arrange(desc(num_studies))

# Plotting Studies by Life Stage
plot_life_stages <- ggplot(life_stage_by_studies, aes(x = reorder(life_stage, num_studies), y = num_studies)) +
  geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Life Stage",
    y = "Number of Studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 10)  # Adjust font size for life stage names
  )
plot_life_stages

# ------------------------------------------------------------------------------
# Counting Studies Per Sex
sex_by_studies <- dat %>%
  mutate(sex = ifelse(is.na(sex), "not reported", sex)) %>%
  group_by(sex) %>%
  summarise(num_studies = n_distinct(key)) %>%
  arrange(desc(num_studies))

# Plotting Studies by Sex
plot_sex <- ggplot(sex_by_studies, aes(x = reorder(sex, num_studies), y = num_studies)) +
  geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Sex",
    y = "Number of Studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 10)  # Adjust font size for sex labels
  )
plot_sex

# ------------------------------------------------------------------------------  
# Counting Studies Per Realm
realm_by_studies <- dat %>%
  group_by(realm) %>%
  summarise(num_studies = n_distinct(key)) %>%
  arrange(desc(num_studies))

# Plotting Studies by Realm
plot_realm <- ggplot(realm_by_studies, aes(x = reorder(realm, num_studies), y = num_studies)) +
  geom_bar(stat = "identity", fill = "#00AFBB", width = 0.7) +
  coord_flip() +
  theme_pubr() +
  labs(
    x = "Realms",
    y = "Number of Studies"
  ) +
  theme(
    axis.title.x = element_text(face = "bold", size = 12),
    axis.title.y = element_text(face = "bold", size = 12),
    axis.text.y = element_text(size = 10)  # Adjust font size for realm labels
  )
plot_realm

# ------------------------------------------------------------------------------
# Combine Plots
Figure_7 <- plot_grid(
  plot_sex, 
  plot_life_stages,
  plot_realm,
  labels = c("A", "B", "C"),
  nrow = 3,
  ncol = 1,
  label_size = 15,
  align = "hv"
)

# Store Plots
ggsave('../manuscript/Figure_7.pdf', Figure_7, width = 5, height = 8)
ggsave('../manuscript/Figure_7.png', Figure_7, width = 5, height = 8, dpi = 1200)

Figure_8: Cell size traits

# Define colors for each class
cols_class <- c("Actinopterygii"    = "#ef5675",
                "Chondrichthyes"    = "#7a5195",
                "Cyclostomata"      = "#075983",
                "Dipnoi"            = "#ffa600")
# ------------------------------------------------------------------------------
# Clean the data for cell area
dat_clean_cell_area <- dat %>%
  filter(!is.na(cell_area))

# Calculate summary statistics of cell area for each class
df_summary <- dat_clean_cell_area %>%
  group_by(class) %>%
  summarise(
    n_species = n_distinct(species),
    n_obs = n(),
    max_y_cell_area = max(cell_area, na.rm = TRUE))

# Plot for Cell Area
plot_cell_area <- ggplot(dat_clean_cell_area, aes(x = class, y = cell_area, fill = class)) +
  geom_boxplot(width = 0.6, 
               fill = "white",
               outlier.shape = NA) +
  geom_text(data = df_summary, 
            aes(y = max_y_cell_area, label = paste0("N = ", n_obs, "\n(", n_species, " spp.)")), 
            vjust = -0.5, size = 2) +
  theme_pubr() +
  theme(
    legend.position = "none",
    axis.text.x = element_text(size = 6),
    axis.text.y = element_text(size = 6),
    axis.title.x = element_blank(),
    axis.title.y = element_text(size = 10),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 1)
  ) +
  labs(y = expression("Cell area (" * mu * m^2 * ")")) +
  scale_y_log10(
    breaks = c(10, 20, 50, 150, 350, 1000),
    limits = c(NA, 2000)
  ) +
  scale_fill_manual(values = cols_class) +  # For boxplot fill
  scale_color_manual(values = cols_class) +
  geom_point(
    aes(colour = class),
    size = 1,
    alpha = .5,
    position = position_jitter(
      seed = 1, width = .2
    ))

plot_cell_area

# ------------------------------------------------------------------------------
# Clean the data for cell volume
dat_clean_cell_volume <- dat %>%
  filter(!is.na(cell_volume))

# Calculate summary statistics of cell area for each class
df_summary <- dat_clean_cell_volume %>%
  group_by(class) %>%
  summarise(
    n_species = n_distinct(species),
    n_obs = n(),
    max_y_cell_volume = max(cell_volume, na.rm = TRUE))

# Plot for Cell Volume
plot_cell_volume <- ggplot(dat_clean_cell_volume, aes(x = class, y = cell_volume, fill = class)) +
  geom_boxplot(width = 0.6, 
               fill = "white",
               outlier.shape = NA) +
  geom_text(data = df_summary, 
            aes(y = max_y_cell_volume, label = paste0("N = ", n_obs, "\n(", n_species, " spp.)")), 
            vjust = -0.5, size = 2) +
  theme_pubr() +
  theme(
    legend.position = "none",
    axis.text.x = element_text(size = 6),
    axis.text.y = element_text(size = 6),
    axis.title.x = element_blank(),
    axis.title.y = element_text(size = 10),  # Reduce font size of the y-axis title
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 1)
  ) +
  labs(y = expression("Cell volume (" * mu * m^3 * ")")) +
  scale_y_log10(
    breaks = c(10, 20, 50, 150, 350, 800, 2000, 6000, 20000),
    limits = c(NA, 40000)
  ) +
  scale_fill_manual(values = cols_class) +  # For boxplot fill
  scale_color_manual(values = cols_class) +
  geom_point(
    aes(colour = class),
    size = 1,
    alpha = .5,
    position = position_jitter(
      seed = 1, width = .2
    ))
plot_cell_volume

# ------------------------------------------------------------------------------
# Clean the data for nucleus area
dat_clean_nucleus_area <- dat %>%
  filter(!is.na(nucleus_area))

# Calculate summary statistics of nucleus area for each class
df_summary <- dat_clean_nucleus_area %>%
  group_by(class) %>%
  summarise(
    n_species = n_distinct(species),
    n_obs = n(),
    max_y_nucleus_area = max(nucleus_area, na.rm = TRUE))

# Plot for Nucleus Area
plot_nucleus_area <- ggplot(dat_clean_nucleus_area, aes(x = class, y = nucleus_area, fill = class)) +
  geom_boxplot(width = 0.6, 
               fill = "white",
               outlier.shape = NA) +
  geom_text(data = df_summary, 
            aes(y = max_y_nucleus_area, label = paste0("N = ", n_obs, "\n(", n_species, " spp.)")), 
            vjust = -0.5, size = 2) +
  theme_pubr() +
  theme(
    legend.position = "none",
    axis.text.x = element_text(size = 6),
    axis.text.y = element_text(size = 6),
    axis.title.x = element_blank(),
    axis.title.y = element_text(size = 10),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 1)
  ) +
  labs(y = expression("Nucleus area (" * mu * m^2 * ")")) +
  scale_y_log10(
    breaks = c(0, 5, 10, 20, 50, 100, 200),
    limits = c(NA, 250)
  ) +
  scale_fill_manual(values = cols_class) +  # For boxplot fill
  scale_color_manual(values = cols_class) +
  geom_point(
    aes(colour = class),
    size = 1,
    alpha = .5,
    position = position_jitter(
      seed = 1, width = .2
    ))

plot_nucleus_area

# ------------------------------------------------------------------------------
# Clean the data for nucleus volume
dat_clean_nucleus_volume <- dat %>%
  filter(!is.na(nucleus_volume))

# Calculate summary statistics of nucleus volume for each class
df_summary <- dat_clean_nucleus_volume %>%
  group_by(class) %>%
  summarise(
    n_species = n_distinct(species),
    n_obs = n(),
    max_y_nucleus_volume = max(nucleus_volume, na.rm = TRUE))

# Plot for Nucleus Volume
plot_nucleus_volume <- ggplot(dat_clean_nucleus_volume, aes(x = class, y = nucleus_volume, fill = class)) +
  geom_boxplot(width = 0.6, 
               fill = "white",
               outlier.shape = NA) +
  geom_text(data = df_summary, 
            aes(y = max_y_nucleus_volume, label = paste0("N = ", n_obs, "\n(", n_species, " spp.)")), 
            vjust = -0.5, size = 2) +
  theme_pubr() +
  theme(
    legend.position = "none",
    axis.text.x = element_text(size = 6),
    axis.text.y = element_text(size = 6),
    axis.title.x = element_blank(),
    axis.title.y = element_text(size = 10),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 1)
  ) +
  labs(y = expression("Nucleus volume (" * mu * m^3 * ")")) +
  scale_y_log10(
    breaks = c(10, 20, 50, 150, 350, 1000),
    limits = c(NA, 2000)
  ) +
  scale_fill_manual(values = cols_class) +  # For boxplot fill
  scale_color_manual(values = cols_class) +  # For point colours
  geom_point(
    aes(colour = class),
    size = 1,
    alpha = .5,
    position = position_jitter(
      seed = 1, width = .2
    ))

plot_nucleus_volume

# ------------------------------------------------------------------------------
# Clean the data for mcv
dat_clean_mcv <- dat %>%
  filter(!is.na(mcv))

# Calculate summary statistics of mcv for each class
df_summary <- dat_clean_mcv %>%
  group_by(class) %>%
  summarise(
    n_species = n_distinct(species),
    n_obs = n(),
    max_y_mcv = max(mcv, na.rm = TRUE))

plot_mcv <- ggplot(dat_clean_mcv, aes(x = class, y = mcv, fill = class)) +
  geom_boxplot(width = 0.6, 
               fill = "white",
               outlier.shape = NA) +
  geom_text(data = df_summary, 
            aes(y = max_y_mcv, label = paste0("N = ", n_obs, "\n(", n_species, " spp.)")), 
            vjust = -0.5, size = 2) +
  theme_pubr() +
  theme(
    legend.position = "none",
    axis.text.x = element_text(size = 6),
    axis.text.y = element_text(size = 6),
    axis.title.x = element_blank(),
    axis.title.y = element_text(size = 10),
    panel.border = element_rect(colour = "black", fill = NA, linewidth = 1)
  ) +
  labs(y = expression("Mean corpuscular volume (" * mu * m^3 * ")")) +
  scale_y_log10(
    breaks = c(10, 20, 50, 150, 350, 800, 2000, 6000),
    limits = c(NA, 15000)
  ) +
  scale_fill_manual(values = cols_class) +  # For boxplot fill
  scale_color_manual(values = cols_class) +  # For point colours
  geom_point(
    aes(colour = class),
    size = 1,
    alpha = .5,
    position = position_jitter(
      seed = 1, width = .2
    ))

plot_mcv

# ------------------------------------------------------------------------------
# Align the plots in a grid layout
Figure_8 <- plot_grid(
  plot_cell_area + theme(plot.margin = unit(c(1, 0, 0, 0.2), "cm")), 
  plot_nucleus_area + theme(plot.margin = unit(c(1, 0.5, 0, 0), "cm")),
  plot_cell_volume + theme(plot.margin = unit(c(1, 0, 0, 0.2), "cm")), 
  plot_nucleus_volume + theme(plot.margin = unit(c(1, 0.5, 0, 0), "cm")),
  plot_mcv + theme(plot.margin = unit(c(1, 0, 0, 0.2), "cm")),
  nrow = 3,
  label_size = 14,  # Slightly smaller labels for compact layout
  label_fontface = "bold",  # Make labels bold for clarity
  label_x = 0.1,  # Move labels closer to the plots horizontally
  label_y = 0.96,  # Adjust vertical placement of labels
  align = "hv"  # Ensure alignment
)

## Store Plots
ggsave('../manuscript/Figure_8.pdf', Figure_8, width = 7, height = 9)
ggsave('../manuscript/Figure_8.png', Figure_8, width = 7, height = 9, dpi = 1500)

Data for manuscript

df_years <- refs %>% 
  as.data.frame() %>% 
  select(year) %>% 
  mutate(year = as.numeric(as.character(year))) %>%  # Convert the 'year' column to numeric
  filter(!is.na(year)) %>%  # Remove rows where 'year' is missing (NA)
  group_by(year) %>%
  summarise(num_studies = n()) %>%
  arrange(year) %>%
  mutate(cumulative_studies = cumsum(num_studies))  # Calculate cumulative count

Range de years included in the database

df_years %>% 
  reframe(min_year = min(year), 
          max_year = max(year), 
          total_years = max_year - min_year)
## # A tibble: 1 × 3
##   min_year max_year total_years
##      <dbl>    <dbl>       <dbl>
## 1     1875     2024         149

Extracting and counting journals

journals <- refs %>%
  as.data.frame() %>%
  select(journal) %>%
  filter(!is.na(journal)) %>%
  group_by(journal) %>%
  summarise(num_articles = n()) %>%
  arrange(desc(num_articles))

Journals with highest number of articles

journals %>% 
  slice_head(n = 15)
## # A tibble: 15 × 2
##    journal                                      num_articles
##    <chr>                                               <int>
##  1 Journal of Fish Biology                                14
##  2 Fish Physiology and Biochemistry                        9
##  3 Aquaculture                                             4
##  4 Aquaculture Research                                    4
##  5 Tissue and Cell                                         4
##  6 Aquaculture International                               3
##  7 Ecotoxicology and Environmental Safety                  3
##  8 Environmental Science and Pollution Research            3
##  9 Experimental medicine and surgery                       3
## 10 Fish and Shellfish Immunology                           3
## 11 Iranian Journal of Fisheries Sciences                   3
## 12 Journal of Applied Ichthyology                          3
## 13 Russian Journal of Marine Biology                       3
## 14 Aquatic Toxicology                                      2
## 15 Brazilian Journal of Biology                            2

Calculate stats for both hemispheres (southern and northern)

dat %>%
  filter(between(long_dec, -180, 180), between(lat_dec, -90, 90)) %>% # Include coordinates within expected ranges
  mutate(hemisphere = ifelse(lat_dec > 0, "Northern","Southern")) %>%
  group_by(hemisphere) %>%
  reframe(
    unique_studies   = n_distinct(key),
    unique_positions = n_distinct(unique(paste(lat_dec, long_dec, sep = "_"))), #count unique positions in the db, because some studies have more than one position
    unique_species   = n_distinct(species_reported),
    records          = n())
## # A tibble: 2 × 5
##   hemisphere unique_studies unique_positions unique_species records
##   <chr>               <int>            <int>          <int>   <int>
## 1 Northern              112              131            259    1119
## 2 Southern               21               23             34     120

Number of species

n_spp <- dat %>%
  distinct(species) %>%
  nrow()
n_spp
## [1] 660

Number of species by lineage

dat %>%
  group_by(class) %>%
  reframe(n_spp = length(unique(species)), total_study = length(unique(key)),
                 perc_species = (n_spp/660)* 100)
## # A tibble: 4 × 4
##   class          n_spp total_study perc_species
##   <chr>          <int>       <int>        <dbl>
## 1 Actinopterygii   595         180       90.2  
## 2 Chondrichthyes    57          15        8.64 
## 3 Cyclostomata       5           4        0.758
## 4 Dipnoi             3           3        0.455

Counting studies by species

dat %>%
  group_by(species) %>%
  summarise(num_studies = n_distinct(key)) %>%
  arrange(desc(num_studies)) %>%
  slice_head(n = 15)  # Limit to top 15 species based on the number of studies
## # A tibble: 15 × 2
##    species                 num_studies
##    <chr>                         <int>
##  1 Oncorhynchus mykiss              17
##  2 Cyprinus carpio                  12
##  3 Labeo rohita                     10
##  4 Oreochromis niloticus             8
##  5 Ctenopharyngodon idella           7
##  6 Carassius auratus                 6
##  7 Channa punctata                   6
##  8 Clarias gariepinus                6
##  9 Salmo trutta                      6
## 10 Clarias batrachus                 5
## 11 Lophius piscatorius               5
## 12 Salmo salar                       5
## 13 Tinca tinca                       5
## 14 Dicentrarchus labrax              4
## 15 Echeneis naucrates                4

Counting records by species

dat %>%
  group_by(species) %>%
  summarise(num_records = n()) %>%
  arrange(desc(num_records)) %>%
  slice_head(n = 15)  # Limit to top 15 species based on the number of records
## # A tibble: 15 × 2
##    species                     num_records
##    <chr>                             <int>
##  1 Ctenopharyngodon idella              82
##  2 Oreochromis niloticus                71
##  3 Hypophthalmichthys molitrix          63
##  4 Labeo rohita                         29
##  5 Oncorhynchus mykiss                  24
##  6 Cyprinus carpio                      18
##  7 Dicentrarchus labrax                 17
##  8 Channa punctata                      15
##  9 Salmo salar                          13
## 10 Clarias batrachus                    12
## 11 Abudefduf saxatilis                  11
## 12 Haemulon aurolineatum                11
## 13 Haemulon flavolineatum               11
## 14 Holocentrus ascensionis              11
## 15 Lutjanus griseus                     11

Number of distinct species and studies by sex

dat %>% 
  group_by(sex) %>%
  reframe(n_spp = length(unique(species)), 
          total_studies = length(unique(key))) %>%
  arrange(desc(total_studies))
## # A tibble: 4 × 3
##   sex    n_spp total_studies
##   <fct>  <int>         <int>
## 1 <NA>     623           155
## 2 female    31            20
## 3 male      30            18
## 4 both      29            12

Number of distinct species and studies by life stage

dat %>%
  group_by(life_stage) %>%
  reframe(n_spp = length(unique(species)), 
          total_studies = length(unique(key))) %>%
  arrange(desc(total_studies))
## # A tibble: 4 × 3
##   life_stage  n_spp total_studies
##   <fct>       <int>         <int>
## 1 <NA>          611           104
## 2 adult          61            49
## 3 juvenile       30            32
## 4 fingerlings     3             4

Number of distinct species and studies by realm

dat %>%
  group_by(realm) %>%
  reframe(n_spp = length(unique(species)), 
          total_studies = length(unique(key))) %>%
  arrange(desc(total_studies))
## # A tibble: 5 × 3
##   realm                      n_spp total_studies
##   <chr>                      <int>         <int>
## 1 freshwater-brackish           54            71
## 2 freshwater                    93            62
## 3 freshwater-brackish-marine    73            60
## 4 marine                       307            29
## 5 marine-brackish              133            24

Helper function to summarise a variable

The following function allows for the extraction of descriptive information about which species present the highest and lowest values for each trait.

get_summary <- function(var_name) {
  dat %>%
    group_by(species) %>%
    summarise(
      min_val = min(!!sym(var_name), na.rm = TRUE),
      max_val = max(!!sym(var_name), na.rm = TRUE)
    ) %>%
    summarise(
      min_species = species[which.min(min_val)],
      min_value = min(min_val, na.rm = TRUE),
      max_species = species[which.max(max_val)],
      max_value = max(max_val, na.rm = TRUE),
      range = max_value - min_value,
      magnitude_order = max_value / min_value
    ) %>%
    mutate(variable = var_name) %>%
    select(variable, min_species, min_value, max_species, max_value, range, magnitude_order)
}

List of variables to analyze

variables <- c("cell_area", "nucleus_area", "cell_volume", "nucleus_volume", "mcv")

Generate the summary table

summary_table <- bind_rows(lapply(variables, get_summary))
summary_table
## # A tibble: 5 × 7
##   variable    min_species min_value max_species max_value  range magnitude_order
##   <chr>       <chr>           <dbl> <chr>           <dbl>  <dbl>           <dbl>
## 1 cell_area   Iranocichl…     16.2  Protopteru…      945.   928.            58.2
## 2 nucleus_ar… Iranocichl…      2.56 Proscymnod…      157.   155.            61.5
## 3 cell_volume Iranocichl…     41.1  Protopteru…    17024. 16983.           414. 
## 4 nucleus_vo… Iranocichl…      3.42 Protopteru…      710.   706.           207. 
## 5 mcv         Solea sene…     14.4  Protopteru…     6940   6926.           482.

Additional figures not included in the manuscript

Figure only Actinopterygii

summary_data_act <- dat %>%
  filter(class == "Actinopterygii") %>%
  group_by(species_underscored) %>%
  summarise(across(c(cell_area, cell_volume, nucleus_area, nucleus_volume, mcv), 
                   ~ mean(., na.rm = TRUE)), 
            .groups = "drop") %>%
  rename(
    "Cell area" = cell_area,
    "Cell volume" = cell_volume,
    "Nucleus area" = nucleus_area,
    "Nucleus volume" = nucleus_volume,
    "MCV" = mcv
  )

summary_data_act <- summary_data_act %>%
  mutate(across(c("Cell area", "Cell volume", "Nucleus area", "Nucleus volume", "MCV"), 
                ~ (.-min(., na.rm = TRUE)) / (max(., na.rm = TRUE) - min(., na.rm = TRUE))))  # scale min-max by variable
summary_data_act
## # A tibble: 595 × 6
##    species_underscored `Cell area` `Cell volume` `Nucleus area` `Nucleus volume`
##    <chr>                     <dbl>         <dbl>          <dbl>            <dbl>
##  1 Abalistes_stellatus      0.0722      NaN              0.116          NaN     
##  2 Abramis_brama            0.112         0.104        NaN              NaN     
##  3 Abudefduf_saxatilis      0.100         0.0949         0.0373         NaN     
##  4 Abudefduf_septemfa…    NaN           NaN            NaN                0.0152
##  5 Abudefduf_sordidus       0.179         0.212          0.452            0.155 
##  6 Abudefduf_taurus         0.111         0.111        NaN              NaN     
##  7 Abudefduf_vaigiens…      0.186         0.230          0.447            0.148 
##  8 Acanthocybium_sola…      0.150         0.143        NaN              NaN     
##  9 Acanthogobius_hasta      0.159         0.171          0.413            0.211 
## 10 Acanthopagrus_aust…    NaN           NaN            NaN                0.0339
## # ℹ 585 more rows
## # ℹ 1 more variable: MCV <dbl>
unique(summary_data_act$species_underscored)
##   [1] "Abalistes_stellatus"             "Abramis_brama"                  
##   [3] "Abudefduf_saxatilis"             "Abudefduf_septemfasciatus"      
##   [5] "Abudefduf_sordidus"              "Abudefduf_taurus"               
##   [7] "Abudefduf_vaigiensis"            "Acanthocybium_solandri"         
##   [9] "Acanthogobius_hasta"             "Acanthopagrus_australis"        
##  [11] "Acanthopagrus_butcheri"          "Acanthostracion_polygonius"     
##  [13] "Acanthostracion_quadricornis"    "Acanthurus_bahianus"            
##  [15] "Acanthurus_chirurgus"            "Acanthurus_coeruleus"           
##  [17] "Acanthurus_gahhm"                "Acanthurus_grammoptilus"        
##  [19] "Acipenser_brevirostrum"          "Acipenser_oxyrinchus"           
##  [21] "Acipenser_sinensis"              "Acipenser_sturio"               
##  [23] "Aeoliscus_strigatus"             "Albula_vulpes"                  
##  [25] "Aldrichetta_forsteri"            "Alosa_fallax"                   
##  [27] "Alphestes_afer"                  "Aluterus_schoepfii"             
##  [29] "Aluterus_scriptus"               "Ameiurus_catus"                 
##  [31] "Ammodytes_tobianus"              "Amphiprion_akindynos"           
##  [33] "Amphiprion_clarkii"              "Anabas_testudineus"             
##  [35] "Anguilla_bicolor"                "Anguilla_japonica"              
##  [37] "Anguilla_marmorata"              "Anguilla_rostrata"              
##  [39] "Aplodactylus_arctidens"          "Apogon_maculatus"               
##  [41] "Aprion_virescens"                "Archosargus_rhomboidalis"       
##  [43] "Arothron_manilensis"             "Arripis_trutta"                 
##  [45] "Astyanax_lineatus"               "Astyanax_mexicanus"             
##  [47] "Atheresthes_evermanni"           "Atractosteus_tristoechus"       
##  [49] "Aulacocephalus_temminckii"       "Auxis_thazard"                  
##  [51] "Bairdiella_ronchus"              "Balistapus_undulatus"           
##  [53] "Balistes_capriscus"              "Balistes_carolinensis"          
##  [55] "Balistes_vetula"                 "Barbatula_barbatula"            
##  [57] "Barbatula_toni"                  "Basilichthys_australis"         
##  [59] "Bathygobius_soporator"           "Belone_belone"                  
##  [61] "Bentartia_pusillum"              "Betta_splendens"                
##  [63] "Boops_boops"                     "Boreogadus_saida"               
##  [65] "Bothus_lunatus"                  "Bovichtus_angustifrons"         
##  [67] "Brachygenys_chrysargyrea"        "Brevoortia_tyrannus"            
##  [69] "Brycon_hilarii"                  "Bujurquina_vittata"             
##  [71] "Caelorinchus_innotabilis"        "Caesio_cuning"                  
##  [73] "Calamus_bajonado"                "Calamus_calamus"                
##  [75] "Calamus_penna"                   "Calamus_pennatula"              
##  [77] "Callionymus_lyra"                "Cantherhines_pullus"            
##  [79] "Canthigaster_bennetti"           "Canthigaster_valentini"         
##  [81] "Carangoides_bartholomaei"        "Carangoides_ruber"              
##  [83] "Caranx_carangus"                 "Caranx_hippos"                  
##  [85] "Caranx_ignobilis"                "Caranx_latus"                   
##  [87] "Caranx_lugubris"                 "Caranx_sexfasciatus"            
##  [89] "Carassius_auratus"               "Carassius_carassius"            
##  [91] "Carassius_gibelio"               "Catostomus_catostomus"          
##  [93] "Catostomus_commersonii"          "Centriscops_humerosus"          
##  [95] "Centropomus_undecimalis"         "Centropyge_bicolor"             
##  [97] "Cephalopholis_cruentata"         "Cephalopholis_fulva"            
##  [99] "Cephalopholis_miniata"           "Chaetodipterus_faber"           
## [101] "Chaetodon_capistratus"           "Chaetodon_lunulatus"            
## [103] "Chaetodon_ocellatus"             "Chaetodon_rainfordi"            
## [105] "Chaetodon_sedentarius"           "Chaetodon_striatus"             
## [107] "Channa_argus"                    "Channa_punctata"                
## [109] "Channa_striata"                  "Cheilinus_trilobatus"           
## [111] "Chelidonichthys_cuculus"         "Chelidonichthys_lucerna"        
## [113] "Chelon_ramada"                   "Chilomycterus_spinosus"         
## [115] "Chirocentrus_dorab"              "Chloroscombrus_chrysurus"       
## [117] "Choerodon_albigena"              "Choerodon_cephalotes"           
## [119] "Choerodon_fasciatus"             "Chromis_analis"                 
## [121] "Chromis_viridis"                 "Chrosomus_neogaeus"             
## [123] "Chrysiptera_cyanea"              "Cichlasoma_dimerus"             
## [125] "Ciliata_mustela"                 "Cirrhinus_mrigala"              
## [127] "Cirrhinus_reba"                  "Clarias_batrachus"              
## [129] "Clarias_gariepinus"              "Clupea_harengus"                
## [131] "Clupeonella_cultriventris"       "Cobitis_biwae"                  
## [133] "Cobitis_striata"                 "Cobitis_taenia"                 
## [135] "Cobitis_takatsuensis"            "Coelorinchus_maurofasciatus"    
## [137] "Colossoma_macropomum"            "Conger_conger"                  
## [139] "Contusus_brevicaudus"            "Coregonus_clupeaformis"         
## [141] "Coregonus_maraena"               "Coreius_guichenoti"             
## [143] "Coris_batuensis"                 "Coryphaena_hippurus"            
## [145] "Coryphaenoides_serrulatus"       "Corythoichthys_intestinalis"    
## [147] "Cottus_gobio"                    "Crossosalarias_macrospilus"     
## [149] "Cryptacanthodes_maculatus"       "Cryptocentrus_leptocephalus"    
## [151] "Ctenopharyngodon_idella"         "Cyclopteropsis_jordani"         
## [153] "Cyclopterus_lumpus"              "Cyprinus_carpio"                
## [155] "Dactylopterus_volitans"          "Danio_rerio"                    
## [157] "Dascyllus_aruanus"               "Datnioides_polota"              
## [159] "Delminichthys_ghetaldii"         "Diagramma_labiosum"             
## [161] "Diagramma_picta"                 "Diapterus_rhombeus"             
## [163] "Diastobranchus_capensis"         "Dicentrarchus_labrax"           
## [165] "Diodon_holocanthus"              "Diplodus_argenteus"             
## [167] "Diplodus_vulgaris"               "Diretmichthys_parini"           
## [169] "Dischistodus_prosopotaenia"      "Dissostichus_mawsoni"           
## [171] "Dormitator_latifrons"            "Drepane_punctata"               
## [173] "Echeneis_naucrates"              "Ecsenius_mandibularis"          
## [175] "Ecsenius_yaeyamaensis"           "Electrophorus_electricus"       
## [177] "Ellochelon_vaigiensis"           "Elopichthys_bambusa"            
## [179] "Elops_saurus"                    "Engraulis_anchoita"             
## [181] "Engraulis_encrasicolus"          "Epalzeorhynchos_bicolor"        
## [183] "Epalzeorhynchos_frenatum"        "Epinephelus_adscensionis"       
## [185] "Epinephelus_cyanopodus"          "Epinephelus_fasciatus"          
## [187] "Epinephelus_guttatus"            "Epinephelus_merra"              
## [189] "Epinephelus_ongus"               "Epinephelus_quoyans"            
## [191] "Epinephelus_spilotoceps"         "Epinephelus_striatus"           
## [193] "Equetus_pulcher"                 "Esox_lucius"                    
## [195] "Esox_niger"                      "Eucinostomus_argenteus"         
## [197] "Eucinostomus_gula"               "Eugerres_plumieri"              
## [199] "Eumicrotremus_spinosus"          "Eupomacentrus_fuscus"           
## [201] "Eupomacentrus_leucostictus"      "Eupomacentrus_variabilis"       
## [203] "Euristhmus_lepturus"             "Euthynnus_alletteratus"         
## [205] "Eutrigla_gurnardus"              "Farlowella_acus"                
## [207] "Fistularia_petimba"              "Gadus_morhua"                   
## [209] "Gaidropsarus_ensis"              "Gaidropsarus_mediterraneus"     
## [211] "Galaxias_maculatus"              "Galaxias_olidus"                
## [213] "Gambusia_holbrooki"              "Gasterosteus_aculeatus"         
## [215] "Gerres_cinereus"                 "Gerres_filamentosus"            
## [217] "Gerres_subfasciatus"             "Girella_elevata"                
## [219] "Girella_zebra"                   "Glyptocephalus_cynoglossus"     
## [221] "Glyptosternon_maculatum"         "Gnathanodon_speciosus"          
## [223] "Gobio_gobio"                     "Gobiocypris_rarus"              
## [225] "Gobiodon_citrinus"               "Gobius_cobitis"                 
## [227] "Gymnelus_viridis"                "Gymnocephalus_cernua"           
## [229] "Gymnocranius_audleyi"            "Gymnocypris_eckloni"            
## [231] "Gymnothorax_funebris"            "Gymnothorax_pictus"             
## [233] "Gymnothorax_vicinus"             "Gymnotus_inaequilabiatus"       
## [235] "Gyrinocheilus_aymonieri"         "Haemulon_aurolineatum"          
## [237] "Haemulon_flavolineatum"          "Haemulon_plumierii"             
## [239] "Haemulon_sciurus"                "Halargyreus_johnsonii"          
## [241] "Halichoeres_biocellatus"         "Halichoeres_bivittatus"         
## [243] "Halichoeres_garnoti"             "Halichoeres_radiatus"           
## [245] "Harengula_humeralis"             "Helicolenus_barathri"           
## [247] "Helicolenus_percoides"           "Hemiglyphidodon_plagiometopon"  
## [249] "Hemiramphus_brasiliensis"        "Hemitripterus_americanus"       
## [251] "Heteropneustes_fossilis"         "Heterotis_niloticus"            
## [253] "Hippocampus_abdominalis"         "Hippoglossus_hippoglossus"      
## [255] "Hirundichthys_affinis"           "Holacanthus_bermudensis"        
## [257] "Holacanthus_ciliaris"            "Holacanthus_tricolor"           
## [259] "Holocentrus_ascensionis"         "Holocentrus_rufus"              
## [261] "Hoplias_malabaricus"             "Hoplisoma_metae"                
## [263] "Hoplisoma_paleatus"              "Hucho_hucho"                    
## [265] "Huso_huso"                       "Hypophthalmichthys_molitrix"    
## [267] "Hypophthalmichthys_nobilis"      "Hypoplectrus_unicolor"          
## [269] "Hyporhamphus_melanochir"         "Hypostomus_boulengeri"          
## [271] "Hypostomus_plecostomus"          "Icelus_spatula"                 
## [273] "Ictalurus_punctatus"             "Idiacanthus_atlanticus"         
## [275] "Iranocichla_hormuzensis"         "Istigobius_rigilius"            
## [277] "Jenynsia_lineata"                "Kathetostoma_canaster"          
## [279] "Katsuwonus_pelamis"              "Konosirus_punctatus"            
## [281] "Labeo_catla"                     "Labeo_chrysophekadion"          
## [283] "Labeo_rohita"                    "Labrisomus_nuchipinnis"         
## [285] "Lachnolaimus_maximus"            "Lactophrys_trigonus"            
## [287] "Lagocephalus_lunaris"            "Lampris_regius"                 
## [289] "Lates_calcarifer"                "Lefua_echigonia"                
## [291] "Lefua_nikkonis"                  "Leiopotherapon_unicolor"        
## [293] "Lepidopsetta_bilineata"          "Lepomis_macrochirus"            
## [295] "Lethrinus_atkinsoni"             "Lethrinus_miniatus"             
## [297] "Lethrinus_nebulosus"             "Lethrinus_rubrioperculatus"     
## [299] "Leucaspius_delineatus"           "Leuciscus_idus"                 
## [301] "Limanda_aspera"                  "Limanda_limanda"                
## [303] "Liparis_tunicatus"               "Lipophrys_pholis"               
## [305] "Lophius_americanus"              "Lophius_piscatorius"            
## [307] "Lutjanus_adetii"                 "Lutjanus_analis"                
## [309] "Lutjanus_apodus"                 "Lutjanus_carponotatus"          
## [311] "Lutjanus_cyanopterus"            "Lutjanus_fulviflamma"           
## [313] "Lutjanus_griseus"                "Lutjanus_lutjanus"              
## [315] "Lutjanus_russellii"              "Lutjanus_sebae"                 
## [317] "Lutjanus_synagris"               "Lutjanus_vitta"                 
## [319] "Lutjanus_vivanus"                "Lycodichthys_dearborni"         
## [321] "Macropodus_opercularis"          "Macrourus_berglax"              
## [323] "Makaira_nigricans"               "Megaleporinus_macrocephalus"    
## [325] "Megalobrama_amblycephala"        "Megalops_cyprinoides"           
## [327] "Melanogrammus_aeglefinus"        "Merlangius_merlangus"           
## [329] "Merluccius_bilinearis"           "Merluccius_hubbsi"              
## [331] "Merluccius_merluccius"           "Mesogobius_batrachocephalus"    
## [333] "Mesovagus_antipodum"             "Metynnis_hypsauchen"            
## [335] "Metynnis_maculatus"              "Micropogonias_furnieri"         
## [337] "Micropterus_coosae"              "Micropterus_salmoides"          
## [339] "Microspathodon_chrysurus"        "Misgurnus_anguillicaudatus"     
## [341] "Mola_mola"                       "Monopterus_albus"               
## [343] "Morone_americana"                "Morone_saxatilis"               
## [345] "Mugil_cephalus"                  "Mugil_curema"                   
## [347] "Mugil_liza"                      "Mulloidichthys_martinicus"      
## [349] "Mulloidichthys_vanicolensis"     "Mullus_barbatus"                
## [351] "Mullus_surmuletus"               "Muraenesox_cinereus"            
## [353] "Myoxocephalus_octodecemspinosus" "Myoxocephalus_quadricornis"     
## [355] "Myoxocephalus_scorpius"          "Myripristis_jacobus"            
## [357] "Mystus_vittatus"                 "Myxocyprinus_asiaticus"         
## [359] "Myxus_elongatus"                 "Myzopsetta_ferruginea"          
## [361] "Nectamia_savayensis"             "Nematalosa_come"                
## [363] "Neocyttus_rhomboidalis"          "Neogobius_melanostomus"         
## [365] "Neoscopelus_macrolepidotus"      "Niwaella_delicata"              
## [367] "Notolabrus_tetricus"             "Notopterus_notopterus"          
## [369] "Novaculichthys_taeniourus"       "Nuchequula_decora"              
## [371] "Ocyurus_chrysurus"               "Odontesthes_argentinensis"      
## [373] "Ogcocephalus_vespertilio"        "Oligoplites_saurus"             
## [375] "Oncorhynchus_keta"               "Oncorhynchus_kisutch"           
## [377] "Oncorhynchus_mykiss"             "Oncorhynchus_tshawytscha"       
## [379] "Ophichthus_cephalozona"          "Oplopomus_oplopomus"            
## [381] "Opsanus_tau"                     "Oreochromis_mossambicus"        
## [383] "Oreochromis_niloticus"           "Osmerus_eperlanus"              
## [385] "Osmerus_mordax"                  "Ostorhinchus_cookii"            
## [387] "Ostorhinchus_endekataenia"       "Ostorhinchus_guamensis"         
## [389] "Pachypanchax_playfairii"         "Pagellus_bogaraveo"             
## [391] "Pagellus_erythrinus"             "Pagrus_auratus"                 
## [393] "Pangasianodon_hypophthalmus"     "Parachanna_obscura"             
## [395] "Paragobiodon_xanthosoma"         "Paralichthys_lethostigma"       
## [397] "Paralichthys_olivaceus"          "Paramugil_georgii"              
## [399] "Parapercis_cylindrica"           "Parapercis_hexophtalma"         
## [401] "Parupeneus_forsskali"            "Pelates_quadrilineatus"         
## [403] "Pempheris_schomburgkii"          "Perca_flavescens"               
## [405] "Perca_fluviatilis"               "Petroscirtes_fallax"            
## [407] "Petroscirtes_lupus"              "Petroscirtes_mitratus"          
## [409] "Phoxinus_phoxinus"               "Piaractus_brachypomus"          
## [411] "Piaractus_mesopotamicus"         "Pimelodella_gracilis"           
## [413] "Plagioscion_squamosissimus"      "Plagiotremus_rhinorhynchos"     
## [415] "Planiliza_macrolepis"            "Platichthys_flesus"             
## [417] "Platybelone_argala"              "Platycephalus_bassensis"        
## [419] "Platycephalus_indicus"           "Plectropomus_leopardus"         
## [421] "Pleuronectes_platessa"           "Podothecus_accipenserinus"      
## [423] "Poecilia_mexicana"               "Poecilia_reticulata"            
## [425] "Pollachius_virens"               "Polypterus_palmas"              
## [427] "Polypterus_senegalus"            "Pomacanthus_arcuatus"           
## [429] "Pomacanthus_paru"                "Pomacentrus_nagasakiensis"      
## [431] "Pomadasys_kaakan"                "Porichthys_porosissimus"        
## [433] "Premnas_biaculeatus"             "Priacanthus_tayenus"            
## [435] "Prionotus_carolinus"             "Prionotus_evolans"              
## [437] "Pristiapogon_kallopterus"        "Prochilodus_lineatus"           
## [439] "Prognathodes_aculeatus"          "Prosopium_cylindraceum"         
## [441] "Psalidodon_anisitsi"             "Psettodes_erumei"               
## [443] "Pseudaphritis_urvillii"          "Pseudocaranx_dentex"            
## [445] "Pseudomonacanthus_peroni"        "Pseudoplatystoma_corruscans"    
## [447] "Pseudopleuronectes_americanus"   "Pseudorhombus_jenynsii"         
## [449] "Pseudupeneus_maculatus"          "Pterois_volitans"               
## [451] "Pterophyllum_scalare"            "Pterygoplichthys_pardalis"      
## [453] "Pungitius_pungitius"             "Pygocentrus_nattereri"          
## [455] "Rachycentron_canadum"            "Rastrelliger_kanagurta"         
## [457] "Repomucenus_limiceps"            "Rhamdia_quelen"                 
## [459] "Rhamphichthys_rostratus"         "Rhinesomus_triqueter"           
## [461] "Rhynchocypris_lagowskii"         "Rita_rita"                      
## [463] "Rutilus_kutum"                   "Rutilus_rutilus"                
## [465] "Rypticus_saponaceus"             "Salminus_affinis"               
## [467] "Salmo_caspius"                   "Salmo_salar"                    
## [469] "Salmo_trutta"                    "Salvelinus_alpinus"             
## [471] "Salvelinus_fontinalis"           "Salvelinus_namaycush"           
## [473] "Salvelinus_umbla"                "Sander_vitreus"                 
## [475] "Sarda_australis"                 "Sardina_pilchardus"             
## [477] "Sardinella_gibbosa"              "Sarpa_salpa"                    
## [479] "Scardinius_erythrophthalmus"     "Scarus_coeruleus"               
## [481] "Scarus_croicensis"               "Scarus_ghobban"                 
## [483] "Scarus_guacamaia"                "Scarus_schlegeli"               
## [485] "Scarus_taeniopterus"             "Schizopyge_niger"               
## [487] "Schizothorax_plagiostomus"       "Schizothorax_prenanti"          
## [489] "Scleropages_jardinii"            "Scolopsis_monogramma"           
## [491] "Scolopsis_vosmeri"               "Scomber_scombrus"               
## [493] "Scomberomorus_regalis"           "Scophthalmus_maximus"           
## [495] "Scophthalmus_rhombus"            "Scorpaena_cardinalis"           
## [497] "Scorpaena_plumieri"              "Scorpaena_porcus"               
## [499] "Scorpaenopsis_oxycephalus"       "Scorpis_aequipinnis"            
## [501] "Sebastes_alutus"                 "Sebastes_marinus"               
## [503] "Sebastes_ocutalus"               "Sebastes_polyspinis"            
## [505] "Sebastes_schlegelii"             "Selene_vomer"                   
## [507] "Selenotoca_multifasciata"        "Semotilus_corporalis"           
## [509] "Seriola_hippos"                  "Seriola_lalandi"                
## [511] "Seriola_quinqueradiata"          "Serrasalmus_eigenmanni"         
## [513] "Siganus_doliatus"                "Siganus_fuscescens"             
## [515] "Siganus_lineatus"                "Siganus_spinus"                 
## [517] "Siganus_sutor"                   "Signigobius_biocellatus"        
## [519] "Sillaginodes_punctatus"          "Sillago_analis"                 
## [521] "Silurus_asotus"                  "Siniperca_chuatsi"              
## [523] "Solea_senegalensis"              "Solea_solea"                    
## [525] "Soleichthys_heterorhinos"        "Sorubim_cuspicaudus"            
## [527] "Sorubim_lima"                    "Sparisoma_aurofrenatum"         
## [529] "Sparisoma_chrysopterum"          "Sparisoma_radians"              
## [531] "Sparisoma_viride"                "Sparus_aurata"                  
## [533] "Sphoeroides_greeleyi"            "Sphoeroides_maculatus"          
## [535] "Sphoeroides_spengleri"           "Sphoeroides_testudineus"        
## [537] "Sphyraena_barracuda"             "Sphyraena_obtusata"             
## [539] "Spicara_maena"                   "Sprattus_sprattus"              
## [541] "Squalius_cephalus"               "Stenotomus_chrysops"            
## [543] "Stephanolepis_hispida"           "Sufflamen_fraenatus"            
## [545] "Symphodus_tinca"                 "Synbranchus_marmoratus"         
## [547] "Syngnathus_fuscus"               "Syngnathus_scovelli"            
## [549] "Syngnathus_typhle"               "Synodontis_notatus"             
## [551] "Synodus_intermedius"             "Synodus_sageneus"               
## [553] "Tachysurus_fulvidraco"           "Taurulus_bubalis"               
## [555] "Tautoga_onitis"                  "Terapon_jarbua"                 
## [557] "Terapon_puta"                    "Tetrabrachium_ocellatum"        
## [559] "Tetraodon_nigroviridis"          "Thalassoma_bifasciatum"         
## [561] "Thalassoma_klunzingeri"          "Thalassoma_lucasanum"           
## [563] "Thalassoma_lunare"               "Thunnus_alalunga"               
## [565] "Thunnus_albacares"               "Thunnus_atlanticus"             
## [567] "Thymallus_arcticus"              "Thymallus_thymallus"            
## [569] "Tinca_tinca"                     "Trachinocephalus"               
## [571] "Trachinotus_botla"               "Trachinotus_coppingeri"         
## [573] "Trachinotus_falcatus"            "Trachinus_draco"                
## [575] "Trachurus_trachurus"             "Trachystoma_petardi"            
## [577] "Trematomus_bernacchii"           "Tripodichthys_angustifrons"     
## [579] "Trisopterus_luscus"              "Turrum_fulvoguttatum"           
## [581] "Turrum_gymnostethus"             "Tylosurus_gavialoides"          
## [583] "Ucla_xenogrammus"                "Ulua_aurochs"                   
## [585] "Umbrina_coroides"                "Upeneichthys_lineatus"          
## [587] "Uranoscopus_scaber"              "Urophycis_tenuis"               
## [589] "Valenciennea_longipinnis"        "Xiphias_gladius"                
## [591] "Xiphophorus_hellerii"            "Zebrasoma_scopas"               
## [593] "Zenopsis_nebulosus"              "Zeus_faber"                     
## [595] "Zoarces_americanus"
# Identify species to drop
species_to_drop <- setdiff(tree$tip.label, summary_data_act$species_underscored)
# we will drop 80 species (no bony fishes) from the tree

# Prune the tree
tree_act <- drop.tip(tree, species_to_drop)

# Align data and tree
datF <- summary_data_act %>%
  column_to_rownames("species_underscored")

dat_tree <- datF %>%
  filter(row.names(.) %in% tree_act$tip.label)

# Add missing species
missing_species <- setdiff(tree_act$tip.label, row.names(dat_tree))
dat_tree_NA <- data.frame(matrix(NA, nrow = length(missing_species), ncol = ncol(datF)))
row.names(dat_tree_NA) <- missing_species
colnames(dat_tree_NA) <- colnames(datF)
Data <- rbind(dat_tree, dat_tree_NA)
Data <- Data[match(tree_act$tip.label, row.names(Data)), ]

# Plot with species names
circ_names_act <- ggtree(tree_act, layout = "fan", open.angle = 15, branch.length="none") + 
  geom_tiplab(offset = 8, hjust = 0, size = 0.9)
circ_names_act <- rotate_tree(circ_names_act, 90)
circ_names_act

# Create a new plot with heatmap for each trait using a single scale
tree_data_act <- gheatmap(
  circ_names_act, 
  Data, 
  width = 0.2, 
  offset = 0,  
  colnames_offset_x = 0, 
  colnames_offset_y = 0, 
  font.size = 4, 
  hjust = 0
)
tree_data_act

# Apply the same scale for all traits
tree_data_act <- tree_data_act +
  scale_fill_viridis_c(option = "H", name = "Normalised Cell Traits", na.value = "grey90") +
  theme(
    legend.position = c(0.59, 0.55), 
    legend.title.position = "top", 
    legend.title.align = 0.5,
    legend.direction = "horizontal",
    legend.key = element_blank(),
    legend.background=element_blank(),
    legend.key.width = unit(0.9, "cm"),
    legend.key.height = unit(0.7, "cm")
  )

# ------------------------------------------------------------------------------

Figure non-Actinopterygii

summary_data_no_act <- dat %>%
  filter(class != "Actinopterygii") %>%
  group_by(species_underscored) %>%
  summarise(across(c(cell_area, cell_volume, nucleus_area, nucleus_volume, mcv), 
                   ~ mean(., na.rm = TRUE)), 
            .groups = "drop")%>%
  rename(
    "Cell area" = cell_area,
    "Cell volume" = cell_volume,
    "Nucleus area" = nucleus_area,
    "Nucleus volume" = nucleus_volume,
    "MCV" = mcv
  ) 

summary_data_no_act <- summary_data_no_act %>%
  mutate(across(c("Cell area", "Cell volume", "Nucleus area", "Nucleus volume", "MCV"), 
                ~ (.-min(., na.rm = TRUE)) / (max(., na.rm = TRUE) - min(., na.rm = TRUE))))  # scale min-max by variable
summary_data_no_act
## # A tibble: 65 × 6
##    species_underscored `Cell area` `Cell volume` `Nucleus area` `Nucleus volume`
##    <chr>                     <dbl>         <dbl>          <dbl>            <dbl>
##  1 Aetobatus_narinari       0.133        0.0759        NaN              NaN     
##  2 Aptychotrema_rostr…      0.153      NaN               0.185          NaN     
##  3 Bathyraja_parmifera      0.140      NaN               0.327          NaN     
##  4 Bathytoshia_centro…      0.145        0.0845          0.217            0.249 
##  5 Carcharhinus_brach…      0.0800     NaN               0.135          NaN     
##  6 Carcharhinus_falci…      0.0762       0.0390        NaN              NaN     
##  7 Carcharhinus_leucas      0.0790       0.0404        NaN              NaN     
##  8 Carcharhinus_macul…      0.0178       0.00194         0.0498           0.0335
##  9 Carcharhinus_melan…      0.0424     NaN               0.103          NaN     
## 10 Carcharhinus_milbe…      0.114        0.0794        NaN              NaN     
## # ℹ 55 more rows
## # ℹ 1 more variable: MCV <dbl>
unique(summary_data_no_act$species_underscored)
##  [1] "Aetobatus_narinari"         "Aptychotrema_rostrata"     
##  [3] "Bathyraja_parmifera"        "Bathytoshia_centroura"     
##  [5] "Carcharhinus_brachyurus"    "Carcharhinus_falciformis"  
##  [7] "Carcharhinus_leucas"        "Carcharhinus_maculipinnis" 
##  [9] "Carcharhinus_melanopterus"  "Carcharhinus_milberti"     
## [11] "Carcharhinus_obscurus"      "Carcharhinus_plumbeus"     
## [13] "Centroscymnus_coelolepis"   "Centroscymnus_crepidater"  
## [15] "Centroscymnus_owstoni"      "Chiloscyllium_punctatum"   
## [17] "Dipturus_batis"             "Dipturus_chilensis"        
## [19] "Dipturus_laevis"            "Etmopterus_brachyurus"     
## [21] "Etmopterus_granulosus"      "Fluvitrygon_signifer"      
## [23] "Galeocerdo_cuvier"          "Geotria_australis"         
## [25] "Ginglymostoma_cirratum"     "Hemiscyllium_ocellatum"    
## [27] "Hemitrygon_bennettii"       "Heterodontus_francisci"    
## [29] "Hypanus_americanus"         "Isurus_oxyrinchus"         
## [31] "Lamna_nasus"                "Lampetra_fluviatilis"      
## [33] "Lampetra_planeri"           "Leucoraja_erinaceus"       
## [35] "Leucoraja_ocellata"         "Mustelus_canis"            
## [37] "Myxine_glutinosa"           "Nebrius_ferrugineus"       
## [39] "Negaprion_brevirostris"     "Neoceratodus_forsteri"     
## [41] "Neotrygon_kuhlii"           "Orectolobus_ornatus"       
## [43] "Oxynotus_bruniensis"        "Oxynotus_centrina"         
## [45] "Petromyzon_marinus"         "Prionace_glauca"           
## [47] "Proscymnodon_plunketi"      "Protopterus_aethiopicus"   
## [49] "Protopterus_annectens"      "Raja_clavata"              
## [51] "Raja_montagui"              "Rhizoprionodon_terraenovae"
## [53] "Rostroraja_eglanteria"      "Scyliorhinus_canicula"     
## [55] "Scyliorhinus_stellaris"     "Sphyrna_lewini"            
## [57] "Sphyrna_mokarran"           "Sphyrna_tiburo"            
## [59] "Sphyrna_tudes"              "Sphyrna_zygaena"           
## [61] "Squalus_acanthias"          "Squatina_australis"        
## [63] "Squatina_squatina"          "Tetronarce_nobiliana"      
## [65] "Torpedo_torpedo"
# Identify species to drop
species_to_drop <- setdiff(tree$tip.label, summary_data_no_act$species_underscored)

# Prune the tree
tree_no_act <- drop.tip(tree, species_to_drop)

# Align data and tree
datF <- summary_data_no_act %>%
  column_to_rownames("species_underscored")

dat_tree <- datF %>%
  filter(row.names(.) %in% tree_no_act$tip.label)

# Add missing species
missing_species <- setdiff(tree_no_act$tip.label, row.names(dat_tree))
dat_tree_NA <- data.frame(matrix(NA, nrow = length(missing_species), ncol = ncol(datF)))
row.names(dat_tree_NA) <- missing_species
colnames(dat_tree_NA) <- colnames(datF)
Data <- rbind(dat_tree, dat_tree_NA)
Data <- Data[match(tree_no_act$tip.label, row.names(Data)), ]

# Plot with species names
circ_names_no_act <- ggtree(tree_no_act, layout = "fan", open.angle = 15, branch.length="none") + 
  geom_tiplab(offset = 4, hjust = 0, size = 2)
circ_names_no_act <- rotate_tree(circ_names_no_act, 90)
circ_names_no_act

# Create a new plot with heatmap for each trait using a single scale
tree_data_no_act <- gheatmap(
  circ_names_no_act, 
  Data, 
  width = 0.2, 
  offset = 0,
  colnames_offset_x = 0, 
  colnames_offset_y = 0, 
  font.size = 3, 
  hjust = 0
)
tree_data_no_act

# Apply the same scale for all traits
tree_data_no_act <- tree_data_no_act +
  scale_fill_viridis_c(option = "H", 
                       name = "Normalised Cell Traits", 
                       na.value = "grey90") +
  theme(
    legend.position = c(0.6, 0.55),
    legend.title.position = "top",
    legend.title.align = 0.5,
    legend.direction = "horizontal",
    legend.key = element_blank(),
    legend.background=element_blank(),
    legend.key.width = unit(1, "cm"),
    legend.key.height = unit(0.7, "cm")
  )

tree_data_no_act

Add taxonomy based on FishBase

# Extract the list of species names reported in the original dataset
species_list <- dat$species_reported

# Validate the species names using FishBase to ensure accuracy
validated_species <- validate_names(species_list)

# Load the complete taxonomy backbone from FishBase
taxonomy_data <- load_taxa()

# Filter the taxonomy data to retain only the species that were validated
# Select key taxonomic ranks: Class, Order, Family, Genus, and Species
# Rename each selected column by appending 'fish_base' to indicate the source is FishBase
taxonomy_fb <- taxonomy_data %>%
  filter(Species %in% validated_species) %>%
  select(Class, Order, Family, Genus, Species) %>%
  rename_with(~ tolower(.x)) %>%
  rename_with(~ paste0(.x, "_fish_base"))

# Join the FishBase taxonomy information back to the original dataset
dat <- dat %>%
  left_join(taxonomy_fb, by = c("species_reported" = "species_fish_base"))

# A quick inspection of the information extracted from FishBase reveals that many # species names are not contained in FishBase, resulting in NA values for the backbone taxonomy for several species.

Export the ErythroCite Database

# check names y sselec the most relevamt columns
names(dat)
##  [1] "species_reported"     "double_checked"       "database"            
##  [4] "key"                  "body_mass_gram"       "sex"                 
##  [7] "life_stage"           "lat_dec"              "long_dec"            
## [10] "location_description" "sample_size"          "number_of_specimens" 
## [13] "estimate_error_type"  "cell_length"          "cell_length_error"   
## [16] "cell_width"           "cell_width_error"     "cell_area"           
## [19] "cell_area_error"      "cell_volume"          "cell_volume_error"   
## [22] "mcv"                  "mcv_error"            "nucleus_length"      
## [25] "nucleus_length_error" "nucleus_width"        "nucleus_width_error" 
## [28] "nucleus_area"         "nucleus_area_error"   "nucleus_volume"      
## [31] "nucleus_volume_error" "notes"                "phylum"              
## [34] "class"                "order"                "family"              
## [37] "genus"                "species"              "source"              
## [40] "taxo_level"           "isMarine"             "isBrackish"          
## [43] "isFresh"              "realm"                "species_underscored" 
## [46] "cell_length_sd"       "cell_width_sd"        "cell_area_sd"        
## [49] "cell_volume_sd"       "mcv_sd"               "nucleus_length_sd"   
## [52] "nucleus_width_sd"     "nucleus_area_sd"      "nucleus_volume_sd"   
## [55] "address"              "country_collection"   "subcontinent"        
## [58] "class_fish_base"      "order_fish_base"      "family_fish_base"    
## [61] "genus_fish_base"
# slect the most releventa columns and sort where is needed
ErythroCite_DB_v1.0.0 <- dat %>% 
  select(key, phylum, class, order, family, genus, species, species_reported, species_underscored,
         class_fish_base, order_fish_base, family_fish_base, genus_fish_base,
         database,
         location_description, lat_dec, long_dec, country_collection, subcontinent, realm,
         body_mass_gram, sex, life_stage, number_of_specimens,
         cell_length, cell_width, cell_area, cell_volume, mcv,
         nucleus_length, nucleus_width, nucleus_area, nucleus_volume,
         cell_length_sd, cell_width_sd, cell_area_sd, cell_volume_sd, mcv_sd,
         nucleus_length_sd, nucleus_width_sd, nucleus_area_sd, nucleus_volume_sd, notes)

names(ErythroCite_DB_v1.0.0)
##  [1] "key"                  "phylum"               "class"               
##  [4] "order"                "family"               "genus"               
##  [7] "species"              "species_reported"     "species_underscored" 
## [10] "class_fish_base"      "order_fish_base"      "family_fish_base"    
## [13] "genus_fish_base"      "database"             "location_description"
## [16] "lat_dec"              "long_dec"             "country_collection"  
## [19] "subcontinent"         "realm"                "body_mass_gram"      
## [22] "sex"                  "life_stage"           "number_of_specimens" 
## [25] "cell_length"          "cell_width"           "cell_area"           
## [28] "cell_volume"          "mcv"                  "nucleus_length"      
## [31] "nucleus_width"        "nucleus_area"         "nucleus_volume"      
## [34] "cell_length_sd"       "cell_width_sd"        "cell_area_sd"        
## [37] "cell_volume_sd"       "mcv_sd"               "nucleus_length_sd"   
## [40] "nucleus_width_sd"     "nucleus_area_sd"      "nucleus_volume_sd"   
## [43] "notes"
# export file as csv and excel
write.csv(ErythroCite_DB_v1.0.0, "../manuscript/ErythroCite_DB_v1.0.0.csv", row.names = FALSE)

#  and excel
writexl::write_xlsx(ErythroCite_DB_v1.0.0, "../manuscript/ErythroCite_DB_v1.0.0.xlsx")

References

  • Pottier, P., Burke, S., Drobniak, S. M., Lagisz, M. & Nakagawa, S. Sexual (in)equality? A meta-analysis of sex differences in thermal acclimation capacity across ectotherms. Functional Ecology 35, 2663–2678 (2021).

  • Benfey, T. J. & Sutterlin, A. M. The haematology of triploid landlocked Atlantic salmon, Salmo salar L. Journal of Fish Biology 24, 333–338 (1984).

  • Gregory, T. R. Animal genome size database. http://www.genomesize.com/. (2024).

Session information

session_info() %>%
  details(summary = 'Current Session Information', open = TRUE)
Current Session Information

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS 26.0
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Amsterdam
 date     2025-09-29
 pandoc   3.4 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
 quarto   1.6.42 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/quarto

─ Packages ───────────────────────────────────────────────────────────────────
 package       * version date (UTC) lib source
 abind           1.4-8   2024-09-12 [1] CRAN (R 4.3.3)
 ape           * 5.8-1   2024-12-16 [1] CRAN (R 4.3.3)
 aplot           0.2.5   2025-02-27 [1] CRAN (R 4.3.3)
 backports       1.5.0   2024-05-23 [1] CRAN (R 4.3.3)
 bibtex          0.5.1   2023-01-26 [1] CRAN (R 4.3.3)
 blob            1.2.4   2023-03-17 [1] CRAN (R 4.3.3)
 bookdown        0.43    2025-04-15 [1] CRAN (R 4.3.3)
 broom           1.0.8   2025-03-28 [1] CRAN (R 4.3.3)
 bslib           0.9.0   2025-01-30 [1] CRAN (R 4.3.3)
 cachem          1.1.0   2024-05-16 [1] CRAN (R 4.3.3)
 car             3.1-3   2024-09-27 [1] CRAN (R 4.3.3)
 carData         3.0-5   2022-01-06 [1] CRAN (R 4.3.3)
 class           7.3-23  2025-01-01 [1] CRAN (R 4.3.3)
 classInt        0.4-11  2025-01-08 [1] CRAN (R 4.3.3)
 cli             3.6.5   2025-04-23 [1] CRAN (R 4.3.3)
 clipr           0.8.0   2022-02-22 [1] CRAN (R 4.3.3)
 codetools       0.2-20  2024-03-31 [1] CRAN (R 4.3.3)
 cowplot       * 1.1.3   2024-01-22 [1] CRAN (R 4.3.1)
 curl            6.2.3   2025-05-24 [1] CRAN (R 4.3.3)
 data.table      1.17.4  2025-05-26 [1] CRAN (R 4.3.3)
 data.tree       1.1.0   2023-11-12 [1] CRAN (R 4.3.3)
 DataExplorer  * 0.8.3   2024-01-24 [1] CRAN (R 4.3.1)
 DBI             1.2.3   2024-06-02 [1] CRAN (R 4.3.3)
 dbplyr          2.5.0   2024-03-19 [1] CRAN (R 4.3.1)
 desc            1.4.3   2023-12-10 [1] CRAN (R 4.3.3)
 details       * 0.4.0   2025-02-09 [1] CRAN (R 4.3.3)
 digest          0.6.37  2024-08-19 [1] CRAN (R 4.3.3)
 dplyr         * 1.1.4   2023-11-17 [1] CRAN (R 4.3.1)
 duckdb          1.3.2   2025-07-09 [1] CRAN (R 4.3.3)
 duckdbfs        0.1.0   2025-04-04 [1] CRAN (R 4.3.3)
 e1071           1.7-16  2024-09-16 [1] CRAN (R 4.3.3)
 evaluate        1.0.4   2025-06-18 [1] CRAN (R 4.3.3)
 farver          2.1.2   2024-05-13 [1] CRAN (R 4.3.3)
 fastmap         1.2.0   2024-05-15 [1] CRAN (R 4.3.3)
 fishualize    * 0.2.3   2022-03-08 [1] CRAN (R 4.3.0)
 Formula         1.2-5   2023-02-24 [1] CRAN (R 4.3.3)
 fs              1.6.6   2025-04-12 [1] CRAN (R 4.3.3)
 generics        0.1.4   2025-05-09 [1] CRAN (R 4.3.3)
 ggfun           0.1.8   2024-12-03 [1] CRAN (R 4.3.3)
 ggplot2       * 3.5.2   2025-04-09 [1] CRAN (R 4.3.3)
 ggplotify       0.1.2   2023-08-09 [1] CRAN (R 4.3.0)
 ggpubr        * 0.6.0   2023-02-10 [1] CRAN (R 4.3.0)
 ggsignif        0.6.4   2022-10-13 [1] CRAN (R 4.3.0)
 ggthemes      * 5.1.0   2024-02-10 [1] CRAN (R 4.3.1)
 ggtree        * 3.10.1  2024-02-27 [1] Bioconductor 3.18 (R 4.3.2)
 glue            1.8.0   2024-09-30 [1] CRAN (R 4.3.3)
 gridExtra       2.3     2017-09-09 [1] CRAN (R 4.3.3)
 gridGraphics    0.5-1   2020-12-13 [1] CRAN (R 4.3.3)
 gtable          0.3.6   2024-10-25 [1] CRAN (R 4.3.3)
 htmltools       0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
 htmlwidgets     1.6.4   2023-12-06 [1] CRAN (R 4.3.1)
 httr            1.4.7   2023-08-15 [1] CRAN (R 4.3.0)
 igraph          2.1.4   2025-01-23 [1] CRAN (R 4.3.3)
 jquerylib       0.1.4   2021-04-26 [1] CRAN (R 4.3.3)
 jsonlite        2.0.0   2025-03-27 [1] CRAN (R 4.3.3)
 kableExtra    * 1.4.0   2024-01-24 [1] CRAN (R 4.3.1)
 KernSmooth      2.23-26 2025-01-01 [1] CRAN (R 4.3.3)
 knitr           1.50    2025-03-16 [1] CRAN (R 4.3.3)
 labeling        0.4.3   2023-08-29 [1] CRAN (R 4.3.3)
 lattice         0.22-7  2025-04-02 [1] CRAN (R 4.3.3)
 lazyeval        0.2.2   2019-03-15 [1] CRAN (R 4.3.3)
 lifecycle       1.0.4   2023-11-07 [1] CRAN (R 4.3.3)
 lubridate       1.9.4   2024-12-08 [1] CRAN (R 4.3.3)
 magrittr        2.0.3   2022-03-30 [1] CRAN (R 4.3.3)
 maps            3.4.3   2025-05-26 [1] CRAN (R 4.3.3)
 memoise         2.0.1   2021-11-26 [1] CRAN (R 4.3.3)
 networkD3       0.4.1   2025-04-14 [1] CRAN (R 4.3.3)
 nlme            3.1-168 2025-03-31 [1] CRAN (R 4.3.3)
 patchwork       1.3.0   2024-09-16 [1] CRAN (R 4.3.3)
 pillar          1.10.2  2025-04-05 [1] CRAN (R 4.3.3)
 pkgconfig       2.0.3   2019-09-22 [1] CRAN (R 4.3.3)
 plyr            1.8.9   2023-10-02 [1] CRAN (R 4.3.3)
 png             0.1-8   2022-11-29 [1] CRAN (R 4.3.3)
 proxy           0.4-27  2022-06-09 [1] CRAN (R 4.3.3)
 purrr           1.0.4   2025-02-05 [1] CRAN (R 4.3.3)
 R6              2.6.1   2025-02-15 [1] CRAN (R 4.3.3)
 ragg            1.4.0   2025-04-10 [1] CRAN (R 4.3.3)
 RColorBrewer    1.1-3   2022-04-03 [1] CRAN (R 4.3.3)
 Rcpp            1.1.0   2025-07-02 [1] CRAN (R 4.3.3)
 RefManageR    * 1.4.0   2022-09-30 [1] CRAN (R 4.3.0)
 rfishbase     * 5.0.1   2025-01-12 [1] CRAN (R 4.3.3)
 rlang           1.1.6   2025-04-11 [1] CRAN (R 4.3.3)
 rmarkdown       2.29    2024-11-04 [1] CRAN (R 4.3.3)
 rnaturalearth * 1.0.1   2023-12-15 [1] CRAN (R 4.3.1)
 rstatix         0.7.2   2023-02-01 [1] CRAN (R 4.3.0)
 rstudioapi      0.17.1  2024-10-22 [1] CRAN (R 4.3.3)
 sass            0.4.10  2025-04-11 [1] CRAN (R 4.3.3)
 scales          1.4.0   2025-04-24 [1] CRAN (R 4.3.3)
 sessioninfo   * 1.2.3   2025-02-05 [1] CRAN (R 4.3.3)
 sf              1.0-21  2025-05-15 [1] CRAN (R 4.3.3)
 stringi         1.8.7   2025-03-27 [1] CRAN (R 4.3.3)
 stringr         1.5.1   2023-11-14 [1] CRAN (R 4.3.1)
 svglite         2.2.1   2025-05-12 [1] CRAN (R 4.3.3)
 systemfonts     1.2.3   2025-04-30 [1] CRAN (R 4.3.3)
 terra           1.8-50  2025-05-09 [1] CRAN (R 4.3.3)
 textshaping     1.0.1   2025-05-01 [1] CRAN (R 4.3.3)
 tibble        * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidygeocoder  * 1.0.6   2025-03-31 [1] CRAN (R 4.3.3)
 tidyr           1.3.1   2024-01-24 [1] CRAN (R 4.3.1)
 tidyselect      1.2.1   2024-03-11 [1] CRAN (R 4.3.1)
 tidytree        0.4.6   2023-12-12 [1] CRAN (R 4.3.1)
 timechange      0.3.0   2024-01-18 [1] CRAN (R 4.3.3)
 treeio          1.26.0  2023-11-06 [1] Bioconductor
 units           0.8-7   2025-03-11 [1] CRAN (R 4.3.3)
 utf8            1.2.5   2025-05-01 [1] CRAN (R 4.3.3)
 vctrs           0.6.5   2023-12-01 [1] CRAN (R 4.3.3)
 viridisLite     0.4.2   2023-05-02 [1] CRAN (R 4.3.3)
 withr           3.0.2   2024-10-28 [1] CRAN (R 4.3.3)
 writexl         1.5.4   2025-04-15 [1] CRAN (R 4.3.3)
 xfun            0.52    2025-04-02 [1] CRAN (R 4.3.3)
 xml2            1.3.8   2025-03-14 [1] CRAN (R 4.3.3)
 yaml            2.3.10  2024-07-26 [1] CRAN (R 4.3.3)
 yulab.utils     0.2.0   2025-01-29 [1] CRAN (R 4.3.3)

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
 * ── Packages attached to the search path.

──────────────────────────────────────────────────────────────────────────────